STOmics / SAW

GNU General Public License v3.0
145 stars 34 forks source link

SAW-A40004: The sequencing data is empty. #147

Open joweihsieh opened 2 months ago

joweihsieh commented 2 months ago

Hi,

I'm encountering errors while using SAW to process my stereo-seq data. Could you please assist me in resolving this issue?

- Below are the errors:

[ERRO 20240911-19-28-42 p1060618 load_gef matrixloader.py:137] SAW-A40004: The sequencing data is empty, please confirm the /home/woody drylab/FileShare/Stereoseq_raw_data/F24A040007172_PLAkisaT_0906/processed/Cla_20240910/02.count/C04289G213.raw.gef file. E0911 19:28:42.665338 1060618 matrixloader.py:137] SAW-A40004: The sequencing data is empty, please confirm the /home/woodydrylab/FileShare/Stereoseq_raw_data/F24A040007172_PLAkisaT_0906/processed/Cla_20240910/02.count/C04289G213.raw.gef file. Namespace(conf='/home/woodydrylab/FileShare/Stereoseq_raw_data/F24A040007172_PLAkisaT_0906/rawdata/image/C04289G213_SC_20240812_143348_4.0.1.ipr', core=12, func=<class 'main.Pipeline'>, gpu='-1', group=None, input='/home/woodydrylab/FileShare/Stereoseq_raw_data/F24A040007172_PLAkisaT_0906/rawdata/image/C04289G213_SC_20240812_143348_4.0.1.tar.gz', output='/home/woodydrylab/FileShare/Stereoseq_raw_data/F24A040007172_PLAkisaT_0906/processed/Cla_20240910/03.register', protein=None, vis='/home/woodydrylab/FileShare/Stereoseq_raw_data/F 24A040007172_PLAkisaT_0906/processed/Cla_20240910/02.count/C04289G213.raw.gef', whether_cell=True) [] Traceback (most recent call last): File "register-v4.3.2/register/main.py", line 539, in File "register-v4.3.2/register/main.py", line 535, in main File "register-v4.3.2/register/main.py", line 356, in init File "register-v4.3.2/register/main.py", line 502, in run File "register-v4.3.2/register/registration/registration.py", line 99, in registration File "register-v4.3.2/register/utils/matrixloader.py", line 36, in load File "register-v4.3.2/register/utils/matrixloader.py", line 138, in load_gef Exception: SAW-A40004: The sequencing data is empty, please confirm the /home/woodydrylab/FileShare/Stereoseq_raw_data/F24A040007172_PLAkisaT_0906/processed/Cla_20240910/02.count/C04289G213.raw.gef file.

- Here is the log file:

https://www.dropbox.com/scl/fi/568ubwwpbkzmmcsbmidz0/nohup.out?rlkey=fuph3jypskh18umvzs1zxmk2a&dl=0

- Here is my command:

referenceDir=$(readlink -e /home/woodydrylab/FileShare/Stereoseq_raw_data/F24A040007172_PLAkisaT_0906/rawdata) export SINGULARITY_BIND=${referenceDir} files=$(ls ${referenceDir}/reads/*.fq.gz | paste -sd ',' -)

bash stereoPipeline_v7.1.sh -splitCount 16 \ -maskFile ${referenceDir}/mask/C04289G213.barcodeToPos.h5 \ -fq1 $files \ -refIndex /home/woodydrylab/FileShare/Stereoseq_raw_data/F24A040007172_PLAkisaT_0906/genome/Cla/ \ -genomeFile /home/woodydrylab/FileShare/Stereoseq_raw_data/F24A040007172_PLAkisaT_0906/genome/Cla/Lachesis_assembly_changed.fa \ -speciesName Cunninghamia_lanceolata \ -tissueType Xylem \ -annotationFile /home/woodydrylab/FileShare/Stereoseq_raw_data/F24A040007172_PLAkisaT_0906/genome/Cla/Chr_genome_final_gene.gtf \ -outDir /home/woodydrylab/FileShare/Stereoseq_raw_data/F24A040007172_PLAkisaT_0906/processed/Cla_20240910 \ -imageRecordFile ${referenceDir}/image/C04289G213_SC_20240812_143348_4.0.1.ipr \ -imageCompressedFile ${referenceDir}/image/C04289G213_SC_20240812_143348_4.0.1.tar.gz \ -doCellBin Y \ -rRNARemove Y \ -threads 12 \ -sif /home/woodydrylab/DiskArray/guest001/JW/bin/SAW_7.1.sif

Clouate commented 2 months ago

Hi, could you check the contents of 02.count/*stat to see the number of reads annotated to the transcriptome is normal?

joweihsieh commented 2 months ago

Hi,

That doesn't seem normal... Could it be an issue with my GTF file?

FILTER & DEDUPLICATION METRICS

TOTAL_READS PASS_FILTER ANNOTATED_READS UNIQUE_READS FAIL_FILTER_RATE FAIL_ANNOTATE_RATE DUPLICATION_RATE 281677999 88411884 0 0 68.61 100.00 -nan

ANNOTATION METRICS

TOTAL_READS MAP EXONIC INTRONIC INTERGENIC TRANSCRIPTOME ANTISENSE 88411884 88411884 0 0 88411884 0 0 100.0 100.0 0.0 0.0 100.0 0.0 0.0

Clouate commented 2 months ago

Yes, you could use SAW checkGTF -i /path/to/gtf to check whether there are valid genes (recorded in checkGTF.log in the running directory), or check the 02.count/Bam2gem*.log. Another reason may be that the chromosome name in gtf is inconsistent with that in fasta, for example, one has 'chr' and the other does not.