GoekeLab / xpore

Identification of differential RNA modifications from nanopore direct RNA sequencing
https://xpore.readthedocs.io/
MIT License
131 stars 23 forks source link

Request for xpore result of HEK293T-rep2 #160

Open kwonej0617 opened 1 year ago

kwonej0617 commented 1 year ago

Hi, developer. Thank you for providing a useful tool!

I have run xpore and generated the result with HEK293T-WT-rep2 and HEK293T-Mettl3-KO-rep2 successfully. I wanted to check whether I correctly generated the input files for xpore and run xpore by comparing my result with yours. Could you share the raw xpore result of HEK293T-WT-rep2 (KO vs. WT)?

Thank you so much!

yuukiiwa commented 1 year ago

Hi @kwonej0617,

Sorry for the delayed reply! Here is the link to the HEK293T-WT-rep2 and HEK293T-Mettl3-KO-rep2 we generated: https://drive.google.com/file/d/1hGuTEiXOz7H8PuKpJ-teOhkkcYDgaUeH/view?usp=sharing

Thanks!

Best wishes, Yuk Kei

kwonej0617 commented 1 year ago

@yuukiiwa Thank you for your data!

Also, I have a question regarding minimap usage. From xpore_manuel.pdf file, you provided a code for aligning the data into transcriptome: image I am not sure how to prepare MMI file. I did a search about this but I couldn't find any clue about it. Could you please explain about it? Actually, instead of using MMI file, I used fasta file to align data. Do you think it would make a different result compared to using MMI file? Here is the code I used. minimap2 -ax map-ont -uf -t 8 --secondary=no Homo_sapiens.GRCh38_v91.cdna.all_ncrna.fa fastq/all_combined.fastq.gz > bamtx/output.sam samtools view -Sb bamtx/output.sam | samtools sort -o bamtx/output.bam - &>> bamtx/output.bam.log samtools index bamtx/output.bam &>> bamtx/output.bam.log

Thank you for your help!

yuukiiwa commented 1 year ago

Hi @kwonej0617,

Here is the command for preparing an fasta.mmi file:

minimap2 -ax map-ont -t 8 -uf -k14 -d [reference].fasta.mmi [reference].fasta

Then, the alignment command looks like the following:

minimap2 -ax map-ont -t 8 -uf -k14 [reference].fasta.mmi [sample].fastq > [sample].sam

I know people who run the indexing step and the alignment step of minimap2 together in one command. I guess the reason we have them in separate processes is to have the same mmi file for all samples so that we don't have to run indexing everytime.

Thanks!

Best wishes, Yuk Kei

kwonej0617 commented 1 year ago

Hi, @yuukiiwa Happy new year! I just wanted to ask you which minimap2 and nanopolish version you used in generating the HEK293T-rep2 data you shared with me.

Sorry for the delayed reply! Here is the link to the HEK293T-WT-rep2 and HEK293T-Mettl3-KO-rep2 we generated: https://drive.google.com/file/d/1hGuTEiXOz7H8PuKpJ-teOhkkcYDgaUeH/view?usp=sharing

In terms of minimap2, minimap2.1 was mentioned in the paper. Did it mean this version Minimap2-2.1 (r311) https://github.com/lh3/minimap2/releases/tag/v2.1?

yuukiiwa commented 1 year ago

Hi @kwonej0617,

Yes, the minimap2 version should be the one you found. The nanopolish version we used should be 0.13.3. Thanks!

Best wishes, Yuk Kei

kwonej0617 commented 1 year ago

Thank you so much @yuukiiwa !

kwonej0617 commented 1 year ago

@yuukiiwa

Hi, would you mind sharing the reference file you used when running minimap2? I made it based on the description in the paper, but when I run minimap2 with the reference file and checked bitwise flag in bam/sam file, I had lots of unmapped reads (bitwise: 4).

It would be also very helpful if you could how I made my reference file. Step1: download GRCh38 Ensembl annotations release version 91 from https://ftp.ensembl.org/pub/release-91/gtf/homo_sapiens/Homo_sapiens.GRCh38.91.gtf.gz Step2: download coding and noncoding RNA FASTA file from  https://ftp.ensembl.org/pub/release-91/fasta/homo_sapiens/cds/Homo_sapiens.GRCh38.cds.all.fa.gz,  https://ftp.ensembl.org/pub/release-91/fasta/homo_sapiens/ncrna/Homo_sapiens.GRCh38.ncrna.fa.gz. Then, Unzip each file and combined them with cat command line. Step3:: Extract only the lines from the combined .fa file whose transcript matches the transcript ID in gtf file.

Thanks for your help!

kwonej0617 commented 1 year ago

I found a file "Homo_sapiens.GRCh38.cdna.ncrna_wtChrIs_modified.fa" https://zenodo.org/record/5707193#.Y3rL9n1Bw2w. Did you use it as your reference .fa file? The file has heads like ">R2_24_1". I am not sure if it is the transcript id that can be found in gtf file.

It would be very helpful if you could share the reference file and annotation file used in your analysis.

Thank you so much! Happy new year.

yuukiiwa commented 1 year ago

Hi @kwonej0617,

Happy New Year to you too! (sorry for the delayed reply)

Yes, we used Homo_sapiens.GRCh38.cdna.ncrna_wtChrIs_modified.fa together with https://ftp.ensembl.org/pub/release-91/gtf/homo_sapiens/Homo_sapiens.GRCh38.91.gtf.gz, which work fine for us despite the fasta file having heads like >R2_24_1.

Thanks!

Best wishes, Yuk Kei

kwonej0617 commented 1 year ago

Thank you, @yuukiiwa !

kwonej0617 commented 1 year ago

Hi @yuukiiwa I really appreciate your help so far. I've been learning how to run xPore and trying to generate the same result (https://drive.google.com/file/d/1hGuTEiXOz7H8PuKpJ-teOhkkcYDgaUeH/view?usp=sharing) as you obtained (at least similar). Based on your previous feedback on this posting and xPore manual, I run the code as follows but I got different result compared to yours. Could you please check my command lines below and give me comments or suggestions if you think I did differently to yours? Thank you so much for your time and consideration!

- Guppy multi-fast5 format was used in guppy basecalling. Also, it looks like the guppy command line from xPore manual uses gpu mode for speeding up the process. However, it didn't work well for me so I run in cpu mode.

singularity exec /share/pkg/containers/guppy-cpu/guppy-cpu-6.0.7.sif guppy_basecaller --input_path fast5 --save_path fastq_multi --flowcell FLO-MIN106 --kit SQK-RNA002 -q 0 -r --cpu_threads_per_caller 10 --num_callers 4

All fastq files were combined with cat command after guppy was finished.

- Generating the mmi file for alignment

minimap2 -ax map-ont -t 8 -uf -k14 -d Homo_sapiens.GRCh38.cdna.ncrna_wtChrIs_modified.mmi Homo_sapiens.GRCh38.cdna.ncrna_wtChrIs_modified.fa

- Alignment with minimap2 (minimap2/2.1.1, samtools/1.16.1)

minimap2 -ax map-ont -uf -t 8 --secondary=no reference/Homo_sapiens.GRCh38.cdna.ncrna_wtChrIs_modified.mmi fastq/all_combined.fastq > bamtx/output.sam 2>> bamtx/output.sam.log
samtools view -Sb bamtx/output.sam | samtools sort -o bamtx/output.bam - &>> bamtx/outpu.bam.log
samtools index bamtx/output.bam &>> bamtx/output.bam.log

The generated sam file looks like, image Also, bam file looks like, (By the way, is it okay to have N in the sequence?) image

- Nanopolish (nanopolish/0.13.3)

singularity exec /share/pkg/containers/nanopolish/0.13.3/nanopolish-0.13.3.sif nanopolish index -d fast5_single fastq/all_combined.fastq
singularity exec /share/pkg/containers/nanopolish/0.13.3/nanopolish-0.13.3.sif nanopolish eventalign --reads fastq/all_combined.fastq --bam bamtx/output.bam --genome reference/Homo_sapiens.GRCh38.cdna.ncrna_wtChrIs_modified.fa --signal-index --scale-events --summary nanopolish/summary.txt --threads 40 > nanopolish/eventalign.txt

The eventalign file looks like, image

- xPore dataprep (xpore/2.1)

xpore dataprep --eventalign nanopolish/eventalign.txt --gtf_or_gffreference/Homo_sapiens.GRCh38.91.gtf --transcript_fasta reference/Homo_sapiens.GRCh38.cdna.ncrna_wtChrIs_modified.fa --out_dir dataprep/ --genome --n_processes 8

Dataprep otuput looks like succssfully generated. image data.index file image data.log file image data.readcount image eventalign.index image

I run the same steps for HEK293T-WT-rep2 and Mettle3-rep2 data.

- xPore diffmod (xpore/2.1) config file was generated like this. I didn't set any filtering at this point.

notes: Pairwise comparison without replicates with default parameter setting.

data:
    KO:
        rep1: ../../HEK293T-Mettl3-KO-rep2/dataprep/
    WT:
        rep1: ../../HEK293T-WT-rep2/dataprep/

out: ./out

diffmod.log image I think all the process was running well. However, I got less number of m6A sites compared to your result.

862,695 ../diffmod/HEK293T-rep2/out/diffmod.table (my data)
1,439,426 ../../evaluation/HEK293-WT_rep2_diffmod.table (your data)

Also, even same m6A sites have different stats between two data.

Your data: 
ENSG00000004975.11,7229916,CTCAA,-0.7464497945275032,4.270167015813763e-17,-8.405231975071922,4.166319473377218e-05,0.7464914577222369,24.0,24.0,79.58227240337251,77.5230089668035,4.8574319276189275,1.3716342906388321,0.8441091289662174,0.4847446886159896,lower 
My data: 
ENSG00000004975,7229916,CTCAA,0.0120728162453064,0.936359057592379,0.07984685596414345,0.2577994728552101,0.2457266566099037,16.0,17.0,77.51099466895259,81.55083698893299,0.6354646692111212,3.0763108725286465,0.48148548448188344,0.29255286763960375,higher

I really appreciate your help!

yuukiiwa commented 1 year ago

Hi @kwonej0617,

Sorry for the delayed reply!

The difference is probably due to the different versions of the guppy used. We used guppy 2.1.3 instead of 6.0.7.

Thanks!

Best wishes, Yuk Kei

kwonej0617 commented 1 year ago

Hi, @yuukiiwa Thanks for your comment! By the way, how did you install guppy 2.1.3? I can't find the information for that. Is the guppy 2.1.3 only provided for the ont customer on nanopore community website, nanoporetech? Also, except the version of guppy, do you think all the process and command line look okay?

Lastly, could you provide me kit # and flowcell for samples as follows? HEK293T-WT-0-rep1 HEK293T-WT-0-rep2 HEK293T-WT-100-rep1 HEK293T-WT-100-rep2 HEK293T-WT-100-rep3

Thank you!

yuukiiwa commented 1 year ago

Hi @kwonej0617,

@ploy-np used guppy 2.1.3 to basecall the samples around 2018-2019 as our lab is a customer of ONT. The latest version of guppy should basecall much better than guppy 2.1.3, which we would suggest you using guppy 6.0.7 instead of basecalling with guppy 2.1.3.

Your commands look fine to me.

Here is the information on the FLO cell ID and sequencing kit for the samples: HEK293T-WT-0 HEK293T-KO-rep 1: FLO-MIN106D, SQK-RNA001 HEK293T-KO-rep 2: FLO-MIN106D, SQK-RNA002 HEK293T-WT-100 HEK293T-WT-rep 1: FLO-MIN106D, SQK-RNA001 HEK293T-WT-rep 2: FLO-MIN106D, SQK-RNA002 HEK293T-WT-rep 3: FLO-MIN106, SQK-RNA002

Thanks!

Best wishes, Yuk Kei

kwonej0617 commented 1 year ago

Hi, @yuukiiwa I really appreciate your quick response! Thank you!

kwonej0617 commented 1 year ago

Hi, @yuukiiwa.

I really want to replicate supplementary data1. So I performed preprocessing steps, guppy, minimap2, nanopolish and xPore with HEK293T-WT/KO-rep1,2,3 using the command lines in my previous comments posted 3 weeks ago, which is based on xPore manual documentation. Then, I compared my result to supplementary data1. However, m6a sites from my data and other statistical values in other columns were quite different from supplementary data 1. So I am assuming a different guppy version could make the different result. Could you please share guppy 2.1.3 version?

I really appreciate your help! Thank you.

yuukiiwa commented 1 year ago

Hi @kwonej0617,

It is not up to us to distribute any version of guppy as guppy is a closed-sourced software licensed under Oxford Nanopore Technologies. We will get into legal trouble if we share guppy 2.1.3 version with you. We are very sorry about that.

Best wishes, Yuk Kei

kwonej0617 commented 1 year ago

Hi @yuukiiwa

Thank you for your response! I understand that. Also, could you tell me which version of xpore did you use to generate supplementary data1? If I use a different version of xPore from what you used, do you think it would affect the result, like generating different m6a sites?

I really appreciate your help!

yuukiiwa commented 1 year ago

Hi @kwonej0617,

Sorry for the delayed reply! I was on vacation last week. I don't think using different versions ofxpore would affect the m6A sites outputted.

Thanks!

Best wishes, Yuk Kei

kwonej0617 commented 1 year ago

Hi, @yuukiiwa

Thanks for your reply! Also, hope you had a great vacation! :) Yes, I also think the different versions of xPore would not make a huge difference in results. However, because I couldn't reproduce supplementary 1 data using the raw data and xPore 2.1 combination, I just want to try it. Could you please share which version you used in generating your supplementary data? Also, if you could not remember the exact version, could you please recommend a version that I could try?

Thank you so much!