Open annapopova-scripps opened 1 year ago
Hi @annapopova-scripps,
Do you mind sharing the first 10 lines of the eventalign.txt
file, please?
I suspect that some entries in the eventalign.txt
file have the transcript/gene version while some do not. xpore
strips off the gene/transcript version which may explain why it is the size of a single fastq/bam diffmod.table
.
Thanks!
Best wishes, Yuk Kei
Dear Yuk Kei,
Here are the top 10 lines of the eventalign.txt for the five .fastq files I merged.
Thank you for looking into the issue,
Anna
anna@supernova:~/ONT_files/ONT_runs/nanopolish/Xpore_1/70S$ head 70S_10to14_eventalign.txt contig position reference_kmer read_index strand event_index event_level_mean event_stdv event_length model_kmer model_mean model_stdv standardized_level start_idx end_idx RRSB-RRNA 9 AGTTT 19 t 1 111.91 8.435 0.00199AGTTT 124.34 7.49 -1.46 65656 65662 RRSB-RRNA 9 AGTTT 19 t 2 128.19 5.800 0.01527AGTTT 124.34 7.49 0.45 65610 65656 RRSB-RRNA 10 GTTTG 19 t 3 79.84 1.955 0.01560GTTTG 78.97 1.97 0.39 65563 65610 RRSB-RRNA 11 TTTGA 19 t 4 89.31 2.765 0.01394TTTGA 89.94 2.65 -0.21 65521 65563 RRSB-RRNA 11 TTTGA 19 t 5 88.84 1.758 0.00764TTTGA 89.94 2.65 -0.37 65498 65521 RRSB-RRNA 11 TTTGA 19 t 6 88.55 1.466 0.00963TTTGA 89.94 2.65 -0.46 65469 65498 RRSB-RRNA 12 TTGAT 19 t 7 105.18 2.992 0.00564TTGAT 108.32 4.49 -0.62 65452 65469 RRSB-RRNA 12 TTGAT 19 t 8 103.04 1.315 0.00365TTGAT 108.32 4.49 -1.04 65441 65452 RRSB-RRNA 12 TTGAT 19 t 9 107.48 2.771 0.00232TTGAT 108.32 4.49 -0.16 65434 65441
Hi @annapopova-scripps,
Sorry for the delayed reply!
I supposed if your eventalign.txt
is 5 times larger, then your dataprep
is also 5 times larger. Still, when you are comparing the two conditions, the sites that differ are the same, which gives you a diffmod.table
with a similar size to using a single .fastq
file.
Thanks!
Best wishes, Yuk Kei
Hi Yuk Kei,
My eventalign.txt is 5 times larger (5x), BUT dataprep is only 1x, and clearly in the diffmod.table the coverage never exceeds 1000. In fact, from a single .fastq I should get a coverage of ~1500, and from five .fastq files ~7500 per site.
One thing I noticed is that every data.readcount from a single .fastq is the same:
idx,n_reads
RRSB-RRNA,1001
RRLB-RRNA,1001
Which makes me suspicious that dataprep has a cut off of 1000 per gene. No change every fastq. has same number of reads :)
Do you think this is true?
Best, Anna
Hi Anna,
The --readcount_max [READCOUNT_MAX]
is set to 1,000 reads by default. You can customize it by passing the --readcount_max [READCOUNT_MAX]
when running xpore dataprep
.
Thanks!
Best wishes, Yuk Kei
Hi Yuk Kei,
Thank you for the feedback, that fixed the issue and things look consistent now :)
I want to bother you for few more moments:
1) I noticed that --n_processes
option neither in xpore dataprep
nor in xpore diffmod
when specified does not engage more than one processor in Linux. Did you guys have a chance to evaluate the parallel version of xpore?
2) I have spent time to play with positions returned in the diffmod.table
and got an idea how to use z_score
and diff_mod_rate
combination to filter out most modification and variant positions in the set of control samples. Still, I am worried that diffmod.table
contains 10-30 more entries than var/mod positions, and that positions that are known to be unmodified have distinct mu_umod
and mu_mod
values, and significant individual mod_rate
values (as high as 0.9). To test this, I took an in-vitro transcribed dataset, split it into two datasets and ran xpore in a comparative mode. There were hundreds of entries in the diffmod.table
with apparently bimodal distribution of ion current detected by Xpore. This is quite alarming to me, and I am wondering if something can be done to adjust the reference model for that or addressed otherwise to make Xpore more sensitive to real modified positions and quantify stoichiometry.
Sincerely,
Anna
Hi Anna,
io
, --n_processes
uses at most 2 cpus in xpore dataprep
. Still, parallel processing should work well in xpore diffmod
from my experience.Thanks!
Best wishes, Yuk Kei
Dear Xpore developers,
Thank you for the software. It was easy to install and handle , until I decided to merge all 50+ .fastq (and .bam) output files from one nanopore run. Each .fastq roughly has 3-4K reads and on its own produces meaningful Xpore output table, when running in comparative mode. Then, I decided to merge five different .fastq (using cat) and .bam (using samtools merge), and no problem was met at the nanopolish index and nanopolish eventalign steps, consistently I observed that eventalign.txt was five times larger than for a single .fastq. However, Xpore dataprep and Xpore diffmod returned dataprep folder and diffmod.table as if a single .fastq was used.
I am currently limited to adding all individual .fastq files as replicates (below), which is stupid. So, I hope, you may have a good recommendation for the situation.
Anna
data: IVT: REP1: ./IVT_dataprep/IVT_1_dataprep REP2: ./IVT_dataprep/IVT_2_dataprep REP3: ./IVT_dataprep/IVT_3_dataprep REP4: ./IVT_dataprep/IVT_4_dataprep REP5: ./IVT_dataprep/IVT_5_dataprep REP6: ./IVT_dataprep/IVT_6_dataprep REP7: ./IVT_dataprep/IVT_7_dataprep REP8: ./IVT_dataprep/IVT_8_dataprep REP9: ./IVT_dataprep/IVT_9_dataprep REP10: ./IVT_dataprep/IVT_10_dataprep