GoekeLab / m6anet

Detection of m6A from direct RNA-Seq data
https://m6anet.readthedocs.io/
MIT License
104 stars 19 forks source link

Question regarding supplementary 6 table #93

Closed kwonej0617 closed 1 year ago

kwonej0617 commented 1 year ago

Hi, @chrishendra93!

I was just wondering which replicates (among rep1,2,3) of HEK293T WT and KO you used to generate your supplementary table 6. Also, did you use any filtering before or after running m6anet to get supplementary table 6 result, for example, minimum coverage, and short-length reads? I really appreciate your help!

chrishendra93 commented 1 year ago

hi @kwonej0617 , are you talking about supplementary table 2? We used all three replicates for both WT and KO to generate the supplementary table, using the pooling option of xPore so that we can get as many positions as possible.

Supplementary table 6 is m6Anet prediction on both WT and KO cell lines. For that, we used rep1 if I recall correctly, and a minimum coverage of 20 reads per site

kwonej0617 commented 1 year ago

Hi, @chrishendra93 I really appreciate your response.

  1. Is the filtering for the minimum coverage of 20 reads per site already included in m6anet?
  2. Also, I know '–num_iterations=NUM' is not a required option in m6anet inference, but how should I choose the iteration number if I want to set it? What does the iteration number mean? Also, did you set –num_iterations=NUM when generating supplementary table 6?
  3. Lastly, could you please tell me which guppy version you used to generate your supplementary table 6?

Thank you for your help!

chrishendra93 commented 1 year ago
  1. Yes this is done by default in m6anet inference
  2. The default in the older version of m6Anet is 5 while in the latest release this is increased to 1000 for a more stable inference result.
  3. The HEK293T cell line is preprocessed and obtained from xPore paper. I did not re-basecall the sample and so the information should be available from the xPore paper https://www.nature.com/articles/s41587-021-00949-w
kwonej0617 commented 1 year ago

Hi @chrishendra93 !

Thank you for your reply. Could you share your preprocessing data for HEK293T WT and KO rep1 data? If you have already deposited, could you share the link?

Thank you so much!

chrishendra93 commented 1 year ago

hi @kwonej0617 , the preprocessing data, ie, data.json is available through code ocean, the link is in the paper. Let me know if you have trouble with code ocean

Thanks!

kwonej0617 commented 1 year ago

Hi @chrishendra93!

Thank you for your reply! I downloaded hek293_data.readcount.tar.gz from code ocean and compared it with my data.readcount. My data.readcount was generated by the processes following:

For comparison, I used transcript id+position (for example, I checked whether each transcript id+position combination from your data.readcount is also found in my data.readcount). However, many positions in your data are not overlapped with those in my data. image

Which step or factor do you think makes such differences between the two data? I would really appreciate it if you could give me your input.

Also, could you please share your minimap2 output (bam or sam file) for WT and KO replicate 1?

Thank you!

chrishendra93 commented 1 year ago

Hi @kwonej0617, this looks really different. Can I check with you the command that you used for running minimap2? Did you use the command that xPore used in the paper? (minimap2 ‘-ax map-ont -uf–secondary=no’). Also can you provide your readcount file?

kwonej0617 commented 1 year ago

Hi, @chrishendra93 Thank you for your reply! Yes. Basically, I referred to the command lines used in xPore.

mmi files was generated as follows.

minimap2 -ax map-ont -t 8 -uf -k14 -d Homo_sapiens.GRCh38.cdna.ncrna_wtChrIs_modified.mmi Homo_sapiens.GRCh38.cdna.ncrna_wtChrIs_modified.fa

Here is the minimap2 command line. (I used the command line from xpore manual) Fastq file was downloaded.

minimap2 -ax map-ont -uf -t 8 --secondary=no Homo_sapiens.GRCh38.cdna.ncrna_wtChrIs_modified.mmi HEK293T-WT-rep1.fastq.gz > HEK293T-WT-rep1.sam 2>> HEK293T-WT-rep1.sam.log
samtools view -Sb HEK293T-WT-rep1.sam | samtools sort -o HEK293T-WT-rep1.bam - &>>HEK293T-WT-rep1.bam.log
samtools index HEK293T-WT-rep1.bam &>> HEK293T-WT-rep1.bam.index.log

Here is data.readcount file for HEK293T-WT-rep1. data.readcount.gz

Also, I was wondering if you had any QC filtering of fast5 or fastq in your pipeline (ex. minKnow or pycoQC, etc).

I appreciate your help!

kwonej0617 commented 1 year ago

Hi @chrishendra93 I just want to check if you had a chance to take a look my data.readcount.gz file.

I am thinking about which step makes a different result compared to yours. Your paper mentioned nf-core-nanoseq to generate preprocessing data. I guess the quality control part may lead to different alignment results and generated different data.readcount.gz. Could you please provide your nf-core/nanoseq configuration file and .nf files?

Thank you for your help!

chrishendra93 commented 1 year ago

hi @kwonej0617, apology for the delay as I have been packed with other stuffs as well. I will try to look through this over the weekend. Meanwhile, you can check Issue #96, it seems like it has more or less been resolved in there.