epi2me-labs / pychopper

cDNA read preprocessing
Other
61 stars 9 forks source link

Pychopper is not taking the specified parameters and is not finding records in input file #54

Closed Soreic closed 6 months ago

Soreic commented 9 months ago

Hi, I am using this code to run pychopper on fastq files: pychopper \ -r $pychopped/barcode$BARCODE/report.pdf \ -u $pychopped/barcode$BARCODE/unclassified.fastq \ -w $pychopped/barcode$BARCODE/rescued.fastq \ -k PCS109 \ -Y 10000 \ -B 1000000 \ -t 8 \ $input/$file \ $pychopped/barcode$BARCODE/pychopped_barcode$BARCODE.fastq And I get this output: Processing file: SRR1804_barcode01.fastq Using kit: /home/usr/miniconda3/envs/pychopper/lib/python3.8/site-packages/pychopper/primer_data/cDNA_SSP_VNP.fas Configurations to consider: "+:SSP,-VNP|-:VNP,-SSP" Counting fastq records in input file: /vol/usr/Tests/ONT/ONT-seq/fastq_data/raw/SRR1804_barcode01.fastq Total fastq records in input file: 0 Tuning the cutoff parameter (q) on 7625273 sampled reads (100.0%) passing quality filters (Q >= 7.0). Optimizing over 30 cutoff values. 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [27:29:04<00:00, 3609.46s/it] Best cutoff (q) value is 0.8621 with 90% of the reads classified. Processing the whole dataset using a batch size of 1: 182882it [38:08:25, 1.20it/s]

Version of pychopper in my conda env, installed using mamba: pychopper 2.7.9 py_0 nanoporetech

Am I not using pychopper correctly or is there something wrong with my fastq files since pychopper does not find any reads in the input file?

Thank you for your help!

nrhorner commented 9 months ago

Hi @Soreic My guess is that your fastq is incorrectly formatted. What's the fastq look like?

Soreic commented 8 months ago

Hi @nrhorner

Thanks for your answer and sorry for the late response! Here are the first 10 lines of one fastq file: @SRR1804.1.1 1 length=238 GTAGCAACTTCGTTCGGTTATCAGATGGGTGTTTATGATCATCGCCTACCGTGAAACGTTTCCTCTATCCCGGAGGGAATGGAACTTGCCTATCACTCTATTCTTTTTTTTGGTGGTTCGTGTGATTCGAACCTACGACCAATTGATTCAAAAAAACCAACTGCTCTACCGACTGAGCGACCCCCAGCAATATCAGCACCAACGAAATCCATTCCCTCCGATAGATGAAACATCCACA +SRR1804.1.1 1 length=238 $&&%%(%%*+$)122%)/--+*))*(%1633*:3-'&%(%&'(),-,)$$$#%$+'&%+:5-)8882263-,0710247,,+*5354'/10%.,(+0+&#$($'+.,)('%&,34688,)()%$&12*/1568:;88645$&#%*((+&784+01+7:<756$$$$$'<?5A=;:<846:66:;<:$8;8<;9=:9<86484%%'86220.0-,+*))10*((-%$$()&..225#$$ @SRR1804.2.1 2 length=260 TGTTGTCTTGATTATCTGGAGAGTGATGTTTATGATTAACACAGCCTACGTGACGTTTCATATCGGAGGGAATGACCCTACTATGTTGGTGCTGATATTGCTGGGGAGTGCTCTGGCCCGTTGTGAACGGTCCTCCCCACGCTTATAACGGAATTGGAATGGAAACCTCCACCTAATTTGAAAAAAAGAAAAAAAAAAAAAAAAAAAAGAAGATAGAGCGACAGGCAAAGTTCCATTCCCTCCGATAGATGAAACGTCAA +SRR1804.2.1 2 length=260 $#)*)-'$%&$%&$.)$$'$%$*)*%((%&$#&&%'(%#&%&%$())$13038)).0)''.163-2:61276+'$#%%&%03102:8134.7;=?>?>>55660)&%'(%%*%&'-/20.32.'''9--0=;;A=>=:&#%%&&$$,*$$&#&$&$%%,%',./1&(122(430.-$'&#$%&)-+&%---++++++++++**+++*('5451---'1//&$$**-35556@%359=??21=/====;;9:>A;8653*' @SRR1804.3.1 3 length=255 ATGTACTTCGTTCAGATTACCCTCGGGTAGGTGTTTATGGGATAGCCATCTACCGTGACGTTTCATCTACTATCGGAGGGAATGGATTTCTGTTGGTGCTGATATTGCTGAGGGGGTGATTAAGCTCGGCTGGAGGCATCGCGCTTTAAGCAGAGGGTCGGCGGTTCGATCCGTCATGCTTGCAAATTAAAAAAAAAAAAAAAAAAGAAGATGAGGCGACAGGCAAGTTCCATTCCCTCCGATAGATGAAAATCA

Is the problem the sequence identifier that is changed by SRA?

Thank you so much for your help!

nrhorner commented 6 months ago

Hi @Soreic

Sorry for my late response also.

There were missing newline in the reads you pasted

the following works for me

1 @SRR1804.1.1 1 length=238
2 GTAGCAACTTCGTTCGGTTATCAGATGGGTGTTTATGATCATCGCCTACCGTGAAACGTTTCCTCTATCCCGGAGGGAATGGAACTTGCCTATCACTCTATTCTTTTTTTTGGTGGTTCGTGTGATTCGAACCTACGACCAATTGATTCAAAAAAACCAACTGCTCTACCGACTGAGC
3 +SRR1804.1.1 1 length=238
4 $&&%%(%%*+$)122%)/--+*))*(%1633*:3-'&%(%&'(),-,)$$$#%$+'&%+:5-)8882263-,0710247,,+*5354'/10%.,(+0+&#$($'+.,)('%&,34688,)()%$&12*/1568:;88645$&#%*((+&784+01+7:<756$$$$$'<?5A=;:<84
5

which results in:

Output fragments failing length filter (length < 50): 0
-----------------------------------
Reads with two primers: 100.00%
Rescued reads:      0.00%
Unusable reads:     0.00%
-----------------------------------
nrhorner commented 6 months ago

I hope this fixes your issue, but if not pease open another ticket.