STOmics / SAW

GNU General Public License v3.0
145 stars 34 forks source link

Increase Buf Size #138

Open minhtien-trinh opened 3 months ago

minhtien-trinh commented 3 months ago

Hello,

I'm encountering an error when attempting to build a mapping using the following command:

bash /home/minh/st_test/stereoPipeline_minh_v7.1.sh \
-sif /home/minh/SAW_7.1.sif \
-splitCount 1 \
-maskFile /home/minh/Chip_C.barcodeToPos.h5 \
-fq1 /home/minh/ChipC_R1.fastq.gz \
-fq2 /home/minh/ChipC_R2.fastq.gz \
-speciesName Fomes_fomentarius \
-tissueType ChipC \
-refIndex /home/minh/st_test/test \
-annotationFile /home/minh/GCA_022606135.1_Fomfom1_genomic.gtf \
-rRNAremove : N \
-threads 16 \
-outDir /home/minh/st_test/results \
--limitIObufferSize 2000000000 \
--limitOutSAMoneReadBytes 200000 \
--limitBAMsortRAM 4000000000 \
--limitGenomeGenerateRAM 5000000000

Error Details:

Full Error Traceback ``` Fr 9. Aug 11:04:50 CEST 2024 singularity check: pass, and singularity path is /usr/local/bin/singularity Fr 9. Aug 11:04:50 CEST 2024 singularity image file check: file exist and SIF path is /home/minh/SAW_7.1.sif Fr 9. Aug 11:04:50 CEST 2024 => splitMask, compute CID count and predict the memory of mapping start...... WARNING: While bind mounting '/home/minh:/home/minh': destination is already in the mount point list Command being timed: "singularity exec /home/minh/SAW_7.1.sif CIDCount -i /home/minh/Chip_C.barcodeToPos.h5 -s Fomes_fomentarius -g 1G" User time (seconds): 23.52 System time (seconds): 1.63 Percent of CPU this job got: 120% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:20.91 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 694544 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 79 Minor (reclaiming a frame) page faults: 120643 Voluntary context switches: 272085 Involuntary context switches: 231 Swaps: 0 File system inputs: 10068 File system outputs: 8 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 Your sequencing reads are in Q40 format. Fr 9. Aug 11:05:11 CEST 2024 => CID mapping, adapter filtering and RNA alignment start...... ~~~ mapping - ChipC_R1.fastq.gz ~~~ WARNING: While bind mounting '/home/minh:/home/minh': destination is already in the mount point list WARNING: While bind mounting '/home/minh:/home/minh': destination is already in the mount point list WARNING: While bind mounting '/home/minh:/home/minh': destination is already in the mount point list --- Error: The length of reads exceeds the size of buf. --- Error: The length of reads exceeds the size of buf. --- Error: The length of reads exceeds the size of buf. --- Error: The length of reads exceeds the size of buf. bcSTAR: ReadsParse.cpp:911: ThreadBufWrapper::GetBatchReads(ReadsBuf*, char (*)[128], meta*, mmp*, stage*)::: Assertion false' failed. bcSTAR: ReadsParse.cpp:911: ThreadBufWrapper::GetBatchReads(ReadsBuf*, char (*)[128], meta*, mmp*, stage*)::: Assertion false' failed. --- Error: The length of reads exceeds the size of buf. --- Error: The length of reads exceeds the size of buf.--- Error: The length of reads exceeds the size of buf.bcSTAR: ReadsParse.cpp:911: ThreadBufWrapper::GetBatchReads(ReadsBuf*, char (*)[128], meta*, mmp*, stage*)::: Assertion false' failed. bcSTAR: ReadsParse.cpp:911: ThreadBufWrapper::GetBatchReads(ReadsBuf*, char (*)[128], meta*, mmp*, stage*)::: Assertion false' failed. --- Error: The length of reads exceeds the size of buf. /usr/local/bin/mapping: line 1: 443009 Aborted /opt/saw_st_software/pipeline/mapping/bcSTAR $* Command exited with non-zero status 134 Command being timed: "singularity exec /home/minh/SAW_7.1.sif mapping --outSAMattributes spatial --outSAMtype BAM SortedByCoordinate --genomeDir /home/minh/st_test/test --runThreadN 16 --outFileNamePrefix /home/minh/st_test/results/00.mapping/ChipC_R1. --sysShell /bin/bash --stParaFile /home/minh/st_test/results/00.mapping/ChipC_R1.bcPara --readNameSeparator " " --limitBAMsortRAM 63168332971 --limitOutSJcollapsed 10000000 --limitIObufferSize=280000000 --outBAMsortingBinsN 50 --outSAMmultNmax 1" User time (seconds): 98.46 System time (seconds): 11.05 Percent of CPU this job got: 141% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:17.37 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 31013536 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 91 Minor (reclaiming a frame) page faults: 281961 Voluntary context switches: 719490 Involuntary context switches: 393 Swaps: 0 File system inputs: 11846 File system outputs: 48 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 134 ```

I have attempted to increase various buffer and RAM limits, as shown in the command, but the issue persists. The error appears to stem from a potential hardcoded limit in bcSTAR, specifically in the ReadsParse.cpp file. When I attempt to access bcSTAR inside Singularity at /opt/saw_st_software/pipeline/mapping/bcSTAR, I encounter binary output, suggesting that the source code is not included within the container. Additionally, I looked through the bcSTAR source code repository, but I could not locate the ReadsParse.cpp file.

Could you provide guidance on how to resolve this buffer size issue? Specifically:

Thank you for your assistance.

Clouate commented 3 months ago

Hi, the reason for this error may be that your fastq files were not read correctly or were incomplete. For example, in the following situations: 1) They are soft links 2) The content stored in fastq are incorrect, as shown below: 5GJ5D9)1YIJZ(IT `$7SJYU_tmb

3)The fastq files are incomplete, for example, zcat 1. fq. gz | tail will prompt 'unexpected end' Could you find the *. bcPara in 00.mapping and check if 'in1' and 'in2' correctly record the path to fastq files, and checking the integrity like MD5 value?

minhtien-trinh commented 3 months ago

Thank you for the fast response. Here is the output for the .bcPara file:

in=/home/minh/Chip_C.barcodeToPos.h5
in1=/home/minh/ChipC_R1.fastq.gz
in2=/home/minh/ChipC_R2.fastq.gz
barcodeReadsCount=/home/minh/st_test/results/00.mapping/ChipC_R1.barcodeReadsCount.txt
barcodeStart=0
barcodeLen=25
umiStart=25
umiLen=10
mismatch=1
bcNum=599522906
polyAnum=15
mismatchInPolyA=2
Files in Home Directory ``` minh@teamcp1:~$ pwd /home/minh minh@teamcp1:~$ ls Chip_C.barcodeToPos.h5 GCA_022606135.1_Fomfom1_genomic.fna Log.out st_test ChipC_R1.fastq.gz GCA_022606135.1_Fomfom1_genomic.gff output.txt time-1.9 ChipC_R2.fastq.gz GCA_022606135.1_Fomfom1_genomic.gtf SAW_7.1.sif time-1.9.tar.gz GCA_022606135.1.faa local _STARtmp minh@teamcp1:~$ ```

The path to the fastq files seems to be correct from what I can tell.

Checking for file integrity:

MD5 check Chip 1: `0EA81C70C2731204FAB426C7D418332B ` <- original `0ea81c70c2731204fab426c7d418332b` <- server /home/minh/ChipC_R1.fastq.gz Chip 2: `C200D9F0C7D983C63F310AB58619E662` <- original `c200d9f0c7d983c63f310ab58619e662` <- server /home/minh/ChipC_R2.fastq.gz
zcat /home/minh/ChipC_R1.fastq.gz | tail ``` + :F:FF:,FFFF:F:FFFFF::FFF:F,:,F:,,:,::::,F,F,::,:F,FF,::,F,FF,,FFF,,F:,F:,,,FF:,F,:FF,F,FF,,,,,,,,,,:,F,F,,,:,:,,::,,FF,:,,FF,:F,,F,,,,,F,:,,,,,F,,:F,,: @A01685:274:H3V7FDSXC:2:2678:31765:37059:ATAGAGTT 1:N:0:ACTCTTAG+AATCCACG CGTCAGCAGCTCTCAGTACGTCAGCAGTCTCTCAGTACGTCAGCAGTCAGTACGTCAGCAGCCTCTCAGTACGTCAGCAGGCCTCTCAGTACGTCAGCAGTTCGTCCGTCTCCAACTCCCCGCCTCTCAGTACGTCAGCAGGCCGCTCATT + FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF::,FFFFF:FFF:FF:FFF,FF:F::F @A01685:274:H3V7FDSXC:2:2678:32723:37059:GTGATGAT 1:N:0:ACTCTTAG+AATCCACG CTGCTGACGTACTGAGAGGCGGGAGGGGGAGTGAGCGTGGGAGTGGGGACTCGGGGGAGGAGATACCTGGCAGGGCAGGACCTCGCTACATCTTGTCTTGCCGGCGTGACAAGCGGTGCAGGCCCTGCGTACCGACGGCGCCTTGAGATCG + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF: ```
zcat /home/minh/ChipC_R2.fastq.gz | tail ``` + ,F,,F,:F,,:F:FFFF,F,FFF::,,,:F:,F::,FFFFF,,:,,:,FF,,F:FF,F,FFFFFFF,FF:,F::,,F,,:F,,:F,FFF,,FF,,,,,:FF,FF,F:,,F::FF:F:,::,FFFF,F,FF,F,,:,F:F::F,FF,,FFF, @A01685:274:H3V7FDSXC:2:2678:31765:37059:ATAGAGTT 2:N:0:ACTCTTAG+AATCCACG CTGCTGACGTACTGAGAGGCTGCTGACGTCTGAGACTGCTGACGTACTGAGAGGACTGCTGACGTACTGAGAGCTGCGGGCGTACTGACTGCTGTAGTAAGGAGAGGCCTGCGGGCGGACTGAGAGGAGGGGAGGTGGGGACGGTCGAGCT + FFFFFF:F:FFFFFFFFF:FFFF,FFF:,,,FFFF,FFF,F:FFFF,::,FFFFFFFFFFFF:FFF,:FFF,FF:F:,F,:FF:FFF,F,FF,F,,F,:,,::F,F:,F,,,,F:FFF,,,:,,,FF,FFFF,,,,FF,F,,FF,,FFF,: @A01685:274:H3V7FDSXC:2:2678:32723:37059:GTGATGAT 2:N:0:ACTCTTAG+AATCCACG CAAGGCGCCGTCGGTACGCAGGGCCTGCACCGCTTGTCACGCCGGCAAGACAAGATGTAGCGAGGTCCTGCCCTGCCAGGTATCTCCTCCCCCGAGTCCCCACTCCCACGCTCACTCCCCCTCCCGCCTCTCAGTACGTCAGCAGAGATCG + FFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,F:FF:FFFF:FFFFFFFFFFFF ```
soft link check ``` minh@teamcp1:~$ ls -l /home/minh/ChipC_R1.fastq.gz /home/minh/ChipC_R2.fastq.gz -rw-r--r-- 1 cpohl cpohl 13313446748 9. Feb 17:18 /home/minh/ChipC_R1.fastq.gz -rw-r--r-- 1 cpohl cpohl 14398297929 10. Feb 10:50 /home/minh/ChipC_R2.fastq.gz ```

The files look fine to me but I'm uncertain if the content is stored correctly. Do you know if the tail for the fastq files is correct?

AritaZ-hang commented 3 months ago

I have the exact same problem. I've checked my R1.fastq.gz & R2.fastq.gz following what minhtien-trinh mentioned above and confirmed my R1&R2 are integrate and are not soft links. What happened in the alignment? I've run SAW7.1 for multiple times but never met this error before.

minhtien-trinh commented 3 months ago

This is the first time I'm using SAW but my guess is that my RNA-seq reads are too long (151 bp). In this paper where they used SAW as well the reads are 35/100 bp long. image

Clouate commented 3 months ago

This is the first time I'm using SAW but my guess is that my RNA-seq reads are too long (151 bp). In this paper where they used SAW as well the reads are 35/100 bp long. image

@minhtien-trinh Yes, the maximum read length supported by SAW is 127. I think the method your sample used to construct the library was different from ours? If so, we are sorry that we currently do not enable the parameter for such a long read length. We recommend that you use ST_BarcodeMap(https://github.com/STOmics/ST_BarcodeMap) to decode your h5 file, obtain the CID sequence and its spatial coordinates, and use other tools for mapping and annotation.

Clouate commented 3 months ago

I have the exact same problem. I've checked my R1.fastq.gz & R2.fastq.gz following what minhtien-trinh mentioned above and confirmed my R1&R2 are integrate and are not soft links. What happened in the alignment? I've run SAW7.1 for multiple times but never met this error before.

@AritaZ-hang Hi, could you attach the content of your .bcPara file and run the following code to check the output to confirm whether there are reads longer than 127bp? gunzip -c /path_to_your/R2.fq.gz | awk '{if(length($0)>127){print}}'

AritaZ-hang commented 3 months ago

I have the exact same problem. I've checked my R1.fastq.gz & R2.fastq.gz following what minhtien-trinh mentioned above and confirmed my R1&R2 are integrate and are not soft links. What happened in the alignment? I've run SAW7.1 for multiple times but never met this error before.

@AritaZ-hang Hi, could you attach the content of your .bcPara file and run the following code to check the output to confirm whether there are reads longer than 127bp? gunzip -c /path_to_your/R2.fq.gz | awk '{if(length($0)>127){print}}'

Hi Cloute, I've checked my R2. and found almost all the reads are of 151bp length, longer than the maximum requirements of 127bp. I will use other tools to map this spatial data. Anyway, thanks for your prompt reply. The contents of .bcPara file are listed below. it seems all good.

in=/home/arita/spatial/data//barcodeToPos.h5
in1=/home/arita/spatial/fastqs//H_R1_final.fastq.gz
in2=/home/arita/spatial/fastqs//H_R2_final.fastq.gz
barcodeReadsCount=/home/arita/spatial/test/results/00.mapping/H_R1_final.barcodeReadsCount.txt
barcodeStart=0
barcodeLen=25
umiStart=25
umiLen=10
mismatch=1
bcNum=359440836
polyAnum=15
mismatchInPolyA=2