GoekeLab / xpore

Identification of differential RNA modifications from nanopore direct RNA sequencing
https://xpore.readthedocs.io/
MIT License
131 stars 23 forks source link

Eventalign.index from m6anet output and RNA004 compatibility #215

Open rugilemat opened 3 months ago

rugilemat commented 3 months ago

Hi team,

I want to compare m6a levels between my samples among other analyses. I first ran m6anet to get sample level m6a output and didn't have the foresight and space to keep f5c eventalign files. Is it possible to feed eventalign.index from m6anet dataprep step to xpore to avoid having to regenerate the f5c eventalign files?

Also, my samples were generated with RNA-004 so I just wanted to check if it's compatible with xpore?

Thanks!

yuukiiwa commented 3 months ago

Hi @rugilemat,

xpore dataprep needs the eventalign.txt file, so feeding in the eventalign.index will not work.

The current xpore version does not have the kmer model for RNA004, we will make a new version and upload it here sometime after Easter.

Thanks!

Best wishes, Yuk Kei

rugilemat commented 3 months ago

Thanks for the quick response! Will look forward to the RNA004 model :)

Can I just quickly check if the files were aligned to transcriptome, running xpore dataprep with -genome option will output the genomic coordinates of the modifications, i.e. I do not have to provide transcriptome aligned files?

yuukiiwa commented 3 months ago

Hi @rugilemat,

You will have to provide the --genome, --transcript_fasta <cdna.fasta>, and --gtf_or_gff <ref.gtf> to convert the transcriptome locations to genome locations.

Thanks!

Best wishes, Yuk Kei

rugilemat commented 3 months ago

Hi @yuukiiwa,

Thanks so much! Does the --gtf_or_gff have to be genomic or is it the transcript gtf?

yuukiiwa commented 3 months ago

Hi @rugilemat,

Transcriptomic gtf like this one here would be good: https://ftp.ensembl.org/pub/release-111/gtf/homo_sapiens/Homo_sapiens.GRCh38.111.gtf.gz

Please do make sure your gtf and fasta is from the same release.

Thanks!

Best wishes, Yuk Kei

rugilemat commented 2 months ago

Hi @yuukiiwa ,

I think I must be doing something wrong as I can't get xpore to work with --genome tag. My code is:

eventalign="/scratch/prj/ppn_microglia_mod/directrna/xpore/36S_pilot_events.tsv"
gtf="/scratch/users/k19022845/refgenome/gencode.v44.primary_assembly.annotation.gtf"
ref="/scratch/users/k19022845/refgenome/gencode.v44.transcripts.fa"
output="/scratch/prj/ppn_microglia_mod/directrna/xpore/pilot/36S_trial"

xpore dataprep --eventalign ${eventalign} --gtf_or_gff ${gtf} --transcript_fasta ${ref} --out_dir ${output} --genome

This is probably a silly question but is the --transcript_fasta a sample fasta file or a ref? I always seem to get empty json output when I run --genome flag regardless of what fasta I use but it works fine if I run the code without the flag.

Also do you have any idea when the RNA004 model might be available?

yuukiiwa commented 2 months ago

Hi @rugilemat,

Yes, xpore expects --transcript_fasta to be a cDNA fasta input. I think xpore has some problem with the gencode annotation (simple fix but I don't think we will push it to the master branch anytime soon).

For gencode, your eventalign.txt contig column looks like this:

ENST00000506640.2|ENSG00000228327.4|OTTHUMG00000002406.2|OTTHUMT00000006889.2|ENST00000506640|ENSG00000228327|6432|processed_transcript|

you have to change the contig column to the following instead:

ENST00000506640.2

I think you have to change your fasta > line from

>ENST00000506640.2|ENSG00000228327.4|OTTHUMG00000002406.2|OTTHUMT00000006889.2|ENST00000506640|ENSG00000228327|6432|processed_transcript|

to the following

>ENST00000506640.2

Sorry for the inconvenience!

For your RNA004 sample, did you get the eventalign.txt from f5c eventalign ? If yes, it should be in 9mer. I have added a branch with a kmer model based on ONT's 9mer here.

Thanks!

Best wishes, Yuk Kei

rugilemat commented 2 months ago

Hi, Thanks for the explanation - I'll give it a go after editing the Gencode annotations.

Yes, I've been using f5c eventalign for RNA004 as described in m6anet dataprep and here. The kmer module I've used for f5c is here and it's a 5mer.

How would I add in the xpore kmer model? Is there a --pretrained_model tag for diffmod like in m6anet? I couldn't find it in the documentation/help.

Sorry for all the basic questions, I'm still very new to this so all the help is highly appreciated!

yuukiiwa commented 2 months ago

Hi @rugilemat,

You can use the following 5mer model instead for xpore diffmod:

wget https://raw.githubusercontent.com/GoekeLab/xpore/RNA004_kmer_model/xpore/diffmod/RNA004_5mer_model.txt

To use a specific kmer model for xpore, you will have to add a line called prior. Here is an example:

data:
    <CONDITION_NAME_1>:
        <REP1>: <DIR_PATH_TO_DATA_JSON>
        ...

    <CONDITION_NAME_2>:
        <REP1>: <DIR_PATH_TO_DATA_JSON>
        ...

    ...

prior: <PATH_TO_KMER_MODEL>

out: <DIR_PATH_FOR_OUTPUTS>

Thanks!

Best wishes, Yuk Kei