Weeks-UNC / shapemapper2

Public repository for ShapeMapper 2 releases
Other
29 stars 16 forks source link

Shapemapper2_dmsmode_fastq_format_error #54

Closed gunjangunjan22 closed 2 weeks ago

gunjangunjan22 commented 2 weeks ago

### Hello, i was using shapemapper2 with dms mode , i have downloaded these reads from NCBI using SRA toolkit. but it shows the error that my reads are not in fastq format. I tried to use this reads in fastqc, it worked very fine.****

Started ShapeMapper v2.2.0 at 2024-06-19 12:42:58 Output will be logged to HDV_shapemapper_log.txt Running from directory: /srv/users/gunjan/Desktop/Biotools/shapemapper2 args: --dms --name HDV --target /home/gunjan/7SK/7SK.fasta --modified --R1 /home/gunjan/storage/SRA-toolkit/sratoolkit.3.0.5-centos_linux64/bin/SRR14999511_1.fastq --R2 /home/gunjan/storage/SRA-toolkit/sratoolkit.3.0.5-centos_linux64/bin/SRR14999511_2.fastq --untreated --R1 /home/gunjan/storage/SRA-toolkit/sratoolkit.3.0.5-centos_linux64/bin/SRR14999510_1.fastq --R2 /home/gunjan/storage/SRA-toolkit/sratoolkit.3.0.5-centos_linux64/bin/SRR14999510_2.fastq --overwrite --verbose --serial Using serial mode, which is currently required for DMS mode Using default DMS max-bg of 0.02 You can manually override using the max-bg flag Created pipeline at 2024-06-19 12:42:58 Running FastaFormatChecker at 2024-06-19 12:42:58 . . .

python3 /srv/users/gunjan/Desktop/Biotools/shapemapper2/internals/python/pyshapemap/../../bin/check_fasta_format.py "/home/gunjan/7SK/7SK.fasta" "shapemapper_temp/HDV/AlignPrep/FastaFormatChecker/HDV_AlignPrep_FastaFormatChecker_corrected.fasta"

from inside dir

/srv/users/gunjan/Desktop/Biotools/shapemapper2

. . . done at 2024-06-19 12:42:59 Running BowtieIndexBuilder at 2024-06-19 12:42:59 . . .

bowtie2-build "/home/gunjan/7SK/7SK.fasta" "shapemapper_temp/HDV/AlignPrep/BowtieIndexBuilder/HDV_AlignPrep_BowtieIndexBuilder_index"

from inside dir

/srv/users/gunjan/Desktop/Biotools/shapemapper2

. . . done at 2024-06-19 12:42:59 Running ProgressMonitor at 2024-06-19 12:42:59 . . . . . . done at 2024-06-19 12:42:59 Running QualityTrimmer1 at 2024-06-19 12:42:59 . . .

shapemapper_read_trimmer -i "shapemapper_temp/HDV/Modified/ProgressMonitor/HDV_Modified_ProgressMonitor_output.fastq" -o "shapemapper_temp/HDV/Modified/QualityTrimmer1/HDV_Modified_QualityTrimmer1_trimmed.fastq" -p "20" -l "25" -w "5"

from inside dir

/srv/users/gunjan/Desktop/Biotools/shapemapper2

ERROR: Component "QualityTrimmer1" (sample:Modified) failed, giving the following error message:====================================================================================================== ERROR: Input file shapemapper_temp/HDV/Modified/ProgressMonitor/HDV_Modified_ProgressMonitor_output.fastq does not appear FASTQ formatted.

??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? Component "QualityTrimmer1" (sample:Modified) status: failed (return code 1) ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? stdout ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? Attempting to trim fastq file shapemapper_temp/HDV/Modified/ProgressMonitor/HDV_Modified_ProgressMonitor_output.fastq and write to shapemapper_temp/HDV/Modified/QualityTrimmer1/HDV_Modified_QualityTrimmer1_trimmed.fastq ... Using params: window_size=5, min_phred=20, min_length=25. ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? stderr ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ERROR: Input file shapemapper_temp/HDV/Modified/ProgressMonitor/HDV_Modified_ProgressMonitor_output.fastq does not appear FASTQ formatted. ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ShapeMapper run failed at 2024-06-19 12:43:01.

I checked the fastq files, they seem okay to me. head -n 15 SRR14999510_1.fastq @SRR14999510.1 1 length=251 CAGACAAAAGAAAGGCAGACTGCCACATGCAGCGCCTCATTTGGATGTTTCTGGATTCTTGGAAGCTTGACTACCCTACGTTCTCCTACAAATGGACCTTGAGAGCTTGTTTGTAGGTTCTAGCAGGGGAGCGCAGCTACTCGTATACCCTTGACCGAAGACCGGTCCTCCTCTATCGGGGATTGTCGTCCTCTTCGACCGATCTCGCAGCTTCGTGATGTACTCACATTTATCTTTTATTTAGGTTATGT +SRR14999510.1 1 length=251

11>A1CFFC@FGB1A11AFG1FF1F0F1F11B0AAEE1DFG211B1B2DDG2111BBDF11110ABF00BD1BBFF0B/BBFEFFG1B111B111BBBF111/1>BF0BFG00221BBBF22>11/////</<//<@1@@F/@/@1@??FF01??/---.<<--<<<DDCGGG0;C.----;//;9.;:CABFFF..;;---9-:-9--9:BB---/////;/9/9/:////9//://///////9//// @SRR14999510.2 2 length=250 TTTTGAAAAGAAAGGCAGACTGCCACATGCAGCGCCTCATTTGGATGTGTCTGGAGTCTTGGAAGCTTGACTACCCTACGTTCTCCTACAAATGGACCTTGAGAGCTTGTTTGGAGGTTCTAGCAGGGGAGCGCAGCTACTCGTATACTCTTGACCGAAGACCGGTCCTCCTCTATCGGGGATGTTCGTCCTCTTCGACCGAGCGCGCAGCTTCGGGAGGGACGCACATTTAGCTGTTAGGTAGGTTAGG +SRR14999510.2 2 length=250 1>AAAAFFFFA?FBFA1ECFGCGG1F0GBGB1B0AEEE1DFG211B1B1DDG1010BBFG11110BBF00BD1BBFG0B/BFFFGGG1B111B111BBFG111/1?BG0FFG0/10/?FFG22>11/////?/<//<F1<FC/@/?1?<FGG11??/---.<<--<A<GGGHGH0;C.----:./;;.;;AAFFFF..;:---9-9-9--9;BB--------:-9-9-;///////;///-///-///-- @SRR14999510.3 3 length=250 GATTGAAAAGAAAGGCAGACTGCCACATGCAGCGCCTCATTTGGATGTGTCTGGAGTCTTGGAAGCTTGACTACCCTACGTTCTCCTACAAATGGCCCTTGAGCGCTTGTTTGGAGGTTCTAGCAGGGGAGCGCAGCTCCTCGTATACCCTTGACCGAAGACCGGTCCTCCTCTATCGGGGATGGTCGTCCGACCGACCGCGCAGCTTCGGTAGGGACGCACATGGACCGGTGTGTTAGGGTAGGGACAC +SRR14999510.3 3 length=250 3>ABBFFFFFBFFGGFF?CGGEGHHHFHBFB5B2AEEE3DFG553B3B5DDG5351BBFG53353ABF31BD5BBFG2B1BBFFFFF3B333B432B?BF1333/>?E/BFG3/33/?BFE44B33/////?/<//<@1@@F/?/<1?<FFG01<F----.<<--<<@GFCFGC0;C.----;..;;-;AA-.99---;-9-9--999;-....../9-9-9-9/.//9----...///..:/.../9.9 @SRR14999510.4 4 length=251 CCTTCAAAAGAAAGGCAGACTGCCACATGCAGCGCCTCATTTGGATGTGTCTGGAGTCTTGGAAGCTTGACTACCCTACGTTCTCCTACAAATGGACCTTGAGAGCTTGTTTGGAGGTTCTAGCAGGGGAGCGCAGCTACTCGTATACCCTTGACCGAAGACCGGTCCTCCTCTATCGGGGATGGTCGTCCTCTTCGACCGAGCGCGCAGCTTCGGGAGGGACGCACATGGAGCGTTGAGGTAGGGTAGGG +SRR14999510.4 4 length=251

Please help

Psirving commented 2 weeks ago

See issue #10

This is a known issue, and hopefully will be fixed in the next release. It happens because ShapeMapper2 expects (wrongly) that fastq files with always be empty on the 3rd line of each entry.

The work-around is to delete the contents of this line, except for the "+". You can use sed to do this:

sed -i 's/+SRR.*/+/g' file-name.fastq

gunjangunjan22 commented 2 weeks ago

Thanks @Psirving. I hope this works.