Closed Soreic closed 6 months ago
Hi @Soreic My guess is that your fastq is incorrectly formatted. What's the fastq look like?
Hi @nrhorner
Thanks for your answer and sorry for the late response!
Here are the first 10 lines of one fastq file:
@SRR1804.1.1 1 length=238 GTAGCAACTTCGTTCGGTTATCAGATGGGTGTTTATGATCATCGCCTACCGTGAAACGTTTCCTCTATCCCGGAGGGAATGGAACTTGCCTATCACTCTATTCTTTTTTTTGGTGGTTCGTGTGATTCGAACCTACGACCAATTGATTCAAAAAAACCAACTGCTCTACCGACTGAGCGACCCCCAGCAATATCAGCACCAACGAAATCCATTCCCTCCGATAGATGAAACATCCACA +SRR1804.1.1 1 length=238 $&&%%(%%*+$)122%)/--+*))*(%1633*:3-'&%(%&'(),-,)$$$#%$+'&%+:5-)8882263-,0710247,,+*5354'/10%.,(+0+&#$($'+.,)('%&,34688,)()%$&12*/1568:;88645$&#%*((+&784+01+7:<756$$$$$'<?5A=;:<846:66:;<:$8;8<;9=:9<86484%%'86220.0-,+*))10*((-%$$()&..225#$$ @SRR1804.2.1 2 length=260 TGTTGTCTTGATTATCTGGAGAGTGATGTTTATGATTAACACAGCCTACGTGACGTTTCATATCGGAGGGAATGACCCTACTATGTTGGTGCTGATATTGCTGGGGAGTGCTCTGGCCCGTTGTGAACGGTCCTCCCCACGCTTATAACGGAATTGGAATGGAAACCTCCACCTAATTTGAAAAAAAGAAAAAAAAAAAAAAAAAAAAGAAGATAGAGCGACAGGCAAAGTTCCATTCCCTCCGATAGATGAAACGTCAA +SRR1804.2.1 2 length=260 $#)*)-'$%&$%&$.)$$'$%$*)*%((%&$#&&%'(%#&%&%$())$13038)).0)''.163-2:61276+'$#%%&%03102:8134.7;=?>?>>55660)&%'(%%*%&'-/20.32.'''9--0=;;A=>=:&#%%&&$$,*$$&#&$&$%%,%',./1&(122(430.-$'&#$%&)-+&%---++++++++++**+++*('5451---'1//&$$**-35556@%359=??21=/====;;9:>A;8653*' @SRR1804.3.1 3 length=255 ATGTACTTCGTTCAGATTACCCTCGGGTAGGTGTTTATGGGATAGCCATCTACCGTGACGTTTCATCTACTATCGGAGGGAATGGATTTCTGTTGGTGCTGATATTGCTGAGGGGGTGATTAAGCTCGGCTGGAGGCATCGCGCTTTAAGCAGAGGGTCGGCGGTTCGATCCGTCATGCTTGCAAATTAAAAAAAAAAAAAAAAAAGAAGATGAGGCGACAGGCAAGTTCCATTCCCTCCGATAGATGAAAATCA
Is the problem the sequence identifier that is changed by SRA?
Thank you so much for your help!
Hi @Soreic
Sorry for my late response also.
There were missing newline in the reads you pasted
the following works for me
1 @SRR1804.1.1 1 length=238
2 GTAGCAACTTCGTTCGGTTATCAGATGGGTGTTTATGATCATCGCCTACCGTGAAACGTTTCCTCTATCCCGGAGGGAATGGAACTTGCCTATCACTCTATTCTTTTTTTTGGTGGTTCGTGTGATTCGAACCTACGACCAATTGATTCAAAAAAACCAACTGCTCTACCGACTGAGC
3 +SRR1804.1.1 1 length=238
4 $&&%%(%%*+$)122%)/--+*))*(%1633*:3-'&%(%&'(),-,)$$$#%$+'&%+:5-)8882263-,0710247,,+*5354'/10%.,(+0+&#$($'+.,)('%&,34688,)()%$&12*/1568:;88645$&#%*((+&784+01+7:<756$$$$$'<?5A=;:<84
5
which results in:
Output fragments failing length filter (length < 50): 0
-----------------------------------
Reads with two primers: 100.00%
Rescued reads: 0.00%
Unusable reads: 0.00%
-----------------------------------
I hope this fixes your issue, but if not pease open another ticket.
Hi, I am using this code to run pychopper on fastq files:
pychopper \ -r $pychopped/barcode$BARCODE/report.pdf \ -u $pychopped/barcode$BARCODE/unclassified.fastq \ -w $pychopped/barcode$BARCODE/rescued.fastq \ -k PCS109 \ -Y 10000 \ -B 1000000 \ -t 8 \ $input/$file \ $pychopped/barcode$BARCODE/pychopped_barcode$BARCODE.fastq
And I get this output:Processing file: SRR1804_barcode01.fastq Using kit: /home/usr/miniconda3/envs/pychopper/lib/python3.8/site-packages/pychopper/primer_data/cDNA_SSP_VNP.fas Configurations to consider: "+:SSP,-VNP|-:VNP,-SSP" Counting fastq records in input file: /vol/usr/Tests/ONT/ONT-seq/fastq_data/raw/SRR1804_barcode01.fastq Total fastq records in input file: 0 Tuning the cutoff parameter (q) on 7625273 sampled reads (100.0%) passing quality filters (Q >= 7.0). Optimizing over 30 cutoff values. 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [27:29:04<00:00, 3609.46s/it] Best cutoff (q) value is 0.8621 with 90% of the reads classified. Processing the whole dataset using a batch size of 1: 182882it [38:08:25, 1.20it/s]
Version of pychopper in my conda env, installed using mamba: pychopper 2.7.9 py_0 nanoporetech
Am I not using pychopper correctly or is there something wrong with my fastq files since pychopper does not find any reads in the input file?
Thank you for your help!