Closed NurislamSheih closed 1 year ago
@nrhorner can you comment here please?
@NurislamSheih
I'm not sure what's occurring here. Could you share the sequence and bed file used to generate that image?
Hello @nrhorner,
Absolutely! Here is the link to the Google Drive folder - https://drive.google.com/drive/folders/1Ix2SV5_9_jOS3mW-xJDbE3RVXF2Gt4I6?usp=share_link. Inside, you will find fastq files, fasta files that were obtained from the fastq file, bed files, and a deduplicated bed file. I also included a README file that should clarify what each file contains. Thank you for your help in advance!
@NurislamSheih thanks for the detailed report. I'll get back to you as soon as I can.
@nrhorner thank you! :)
Dear developers,
I found additionally two major issues and one minor issue related to pychopper work. Maybe such behaviour occurs only with my data, but I think it could be found with any data. While using pychopper, I noticed that it saves reads with primers configurations other than +:VNP,-SSP or -:SSP,-VNP (right primers configurations according to the SQK-DCS109 kit) in the filtered fraction.
I processed a fastq file with default 4000 reads, and you can find the files related to these issues at the following link: https://drive.google.com/drive/folders/1-iLWFa5VmjDDpXFTOx2aMnX9_IaE3riL?usp=share_link):
These two examples of reads with primers hits have been saved in the filtered output.
Out of the 4000 reads, only 280 have strict +:VNP,-SSP or -:SSP,-VNP primers configurations when I parsed aligned scores in bed files.
The remaining reads have a variety of primers configurations, as shown in the distribution below:
My question is whether it is necessary for pychopper to save all other primers configurations, given that it includes the region between VNP,-SSP or SSP,-VNP primers despite the presence of another primer in the read. Should I account for the scores of these additional primers that should not normally appear in the reads? Or it is not the intended behaviour of reads filtering?
The second issue is related to rescuing reads without option x. Pychopper seems to rescue VNP,-SSP, SSP,-VNP reads, where the cDNA of each part is not equal to each other. It also considers the not planned primer configuration VNP,-SSP,SSP,-VNP (as shown in the attached screenshot) as in the previous issue.
I plotted the length of each cDNA part from rescued reads (for 30 reads as an example), and you can see that lengths of cDNA parts can be significantly different, but pychopper saves them in the rescued output.
It could be that the predicted primers configurations in some cases are wrong, but pychopper still saves not equal cDNA parts. What do you think about removing reads with not equal cDNA part from rescuing fraction?
And the last issue related to option x, which is protocol-specific read rescue:
if args.x is not None and args.x in ('DCS109'):
if args.x == "DCS109":
CONFIG = "-:VNP,-VNP"
Am I right that you are trying to save reads with VNP,-VNP primers configuration? I'm also wondered why the majority of reads have only VNP,-VNP primers configurations, but we found that it due to artefacts of reverse transcriptase that produce an antisense part due to looping on synthesised cDNA part on RNA and make pseudo duplex or something like that.
The observed problem described at least in two papers (https://doi.org/10.1038/s41467-023-36915-0, doi: 10.1261/rna.078937.121). It was found that one part of those reads is not aligned due to minimap2 can't do it, and the unaligned part, which is supposed to be the same, has much worse Q-score distributions (see below, this screenshot from figure 2b from https://doi.org/10.1038/s41467-023-36915-0)
I discovered that attempting to align the sense and antisense segments of the VNP,-VNP reads to themselves does not result in satisfactory alignment, as shown below.
But the issue related to this x option, as I understand, saves only -:VNP,-VNP read in filtered output and only trims these primers from both reads (see blow attached screenshots).
Do you have a strategy to identify the midpoint of reads where the quality significantly improves in one direction?
Sorry for the long text; I hope that my comments will be useful in improving your incredible tool.
Hi @NurislamSheih
Thanks for the detailed information, it's very useful. Could I just ask for the command used and the version of Pychopper that was used please.
Hi @nrhorner,
Thank you, I appreciate that. I used v2.7.2 of pychopper. The command was the following:
for i in $(cat to_check.txt)
do
pychopper -m edlib -x DCS109 -u ${i::-13}_unclassified_with_x_option_edlib.fastq.gz -w ${i::-13}_rescued_with_x_option_edlib.fastq.gz -r ${i::-13}_pychopper_report_with_x_option_edlib.pdf -S ${i::-13}_stats_with_x_option_edlib.txt -A ${i::-13}_aligned_score_with_x_option_edlib.bed -B 200000 -D ${i::-13}_per_reads_stats.txt -t 20 $i ${i::-13}_filtered_with_x_option_edlib.fastq.gz
pychopper -m edlib -u ${i::-13}_unclassified_edlib.fastq.gz -w ${i::-13}_rescued_edlib.fastq.gz -r ${i::-13}_pychopper_report_edlib.pdf -S ${i::-13}_stats_edlib.txt -A ${i::-13}_aligned_score_edlib.bed -D ${i::-13}_per_reads_stats.txt -t 20 $i ${i::-13}_filtered_edlib.fastq.gz
done
to_check.txt contain a line with PAK67729_pass_barcode09_f8a8ebf5_a3da931e_10.fastq.gz
@NurislamSheih For issue #3. Pychopper is saving in the output fastq reads reads that have configurations such as vnp:-ssp,-vnp because within that read pychopper was able to extra a segment (vnp:-ssp), which is trimmed to remove the sequence outside of these primers. I'm not sure if that answers this question or not?
It might be useful if you opened separate tickets for each of the other questions, and I will adress them there.
Dear @SamStudio8,
Hello, I have come across an issue in visualizing nanopore reads in IGV (as fasta) while incorporating bed files of alignment scores. Attached is an example displaying an artifactual configuration VNP,-VNP,-SSP of a nanopore read obtained using SQK-DSC109 kit. I have observed that there appear to be multiple VNP located in the middle of the road in the file, but in IGV, they seem to be located in the same place. This has led me to question the strange primers configuration displayed in the stats.txt file by pychopper. I would appreciate it if you could kindly explain this behaviour to me. Thank you in advance!