Shians / NanoMethViz

Apache License 2.0
21 stars 2 forks source link

Where can I find/ how can I make methy_data.bgz #26

Closed YingziZhang-github closed 1 year ago

YingziZhang-github commented 1 year ago

Hi Shian,

Thank you very much for developing NanoMethViz. I am using it for my nanopolish output (.tsv format) and I am following the tutorial you wrote.

I learn from the issues in NanoMethViz and the tutorial that I need to transfer the .tsv format by using create_tabix_file. But it is not clear for me how to use it. In Importing data part in the tutorial,

methy_tabix <- file.path(tempdir(), "methy_data.bgz")
samples <- c("sample1", "sample2")

# you should see messages when running this yourself
create_tabix_file(methy_calls, methy_tabix, samples)

# don't do this with actual data
# we have to use gzfile to tell R that we have a gzip compressed file
methy_data <- read.table(
    gzfile(methy_tabix), col.names = methy_col_names(), nrows = 6)

Do you know where can I find my methy_data.bgz or how can I prepare the file?

Besides, I am kind lost when following steps of importing, exporting data and doing differential analysis. I am very appreciated if you can share more details about the relations of the example files among the steps.

Thank you very much.

Yingzi

YingziZhang-github commented 1 year ago

Hi Shian,

I read more in R help and learned that "methy_data.bgz" is the output tabix file.

I have two other questions: 1) When I ran exon_tibble <- get_exons_homo_sapiens(), the progress showed:

Loading required package: Homo.sapiens
Loading required package: OrganismDbi
Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene

The reference genome I used in the upstream steps is hg38. Do you know what should I do to use hg38 in NanoMethViz?

2) I created my tabix file by

create_tabix_file(
  c(input_files),
  methy_tabix,
  c(samples)
)

and bsseq<-methy_to_bsseq(methy_tabix,out_folder = tempdir(), verbose = TRUE)

It reported an error as

[2023-04-18 18:34:41] creating intermediate files...
[2023-04-18 18:34:41] parsing chr11...
[2023-04-18 18:34:43] samples found: 
Error in data.frame(sample = samples, file_path = path(out_folder, paste0(samples,  : 
  arguments imply differing number of rows: 0, 1

Would you suggest how can I fix it? The numbers of the input_files and the samples are the same.

Looking forward to your reply! Thank you.

Yingzi

Shians commented 1 year ago

Hi Yingzi,

I do need to make more explicit functions for different genome versions. You will need to construct your own hg38 annotation based on the style of the hg19 provided, otherwise the genomic coordinates will not line up. If you cannot do this yourself I may get around to it some time next week.

I'm not entirely sure what is causing the error in the conversion to a bsseq object, it looks like no sample names were detected in the bgzip-tabix file. Could you gunzip -c methy_data.bgz | head in the terminal to check if the contents looks correct?

YingziZhang-github commented 1 year ago

Hi Shians,

Thank you very much for answering!

I figured out the hg38 problem and the bsseq object. Thank you very much for the help. I will close this issue and raise another two issues about differential analysis and plotting. Thank you very much!

Yingzi