Xinglab / espresso

Other
48 stars 4 forks source link

Error running ESPRESSO_C #51

Closed Oliverfeudj closed 1 month ago

Oliverfeudj commented 2 months ago

Hello @EricKutschera and thank you for this tool, it is my first time using it and I am struggling a bit

perl ESPRESSO_S.pl -T 6 -A Homo_sapiens.GRCh38.111.gtf.gz -L test.tsv -F Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz -O test_out
./stress1_of_3weeks.bam stress1_of_3weeks in test.tsv does not exist or is not readable!
Don't know what format ./stress1_of_3weeks.bam stress1_of_3weeks is!
Perl exited with active threads:
        6 running and unjoined
        0 finished and unjoined
        0 running and detached

The bam file looks good, it is an alignment of minimap2 converted to bam with samtools. I created a tsv with two columns the first being the directory to the bam sample and the second being the sample name. Can you please help me out here?

Thank you!

EricKutschera commented 2 months ago

Here's the line for that error: https://github.com/Xinglab/espresso/blob/v1.4.0/src/ESPRESSO_S.pl#L180

It's reading the -L --list_samples argument. It expects each line of that file to have tab (\t) separated columns and the values in the first column should end with either sam or bam. Based on the error message it looks like your file might use space as the separator instead of tab. That led to the code seeing ./stress1_of_3weeks.bam stress1_of_3weeks as the 1st and only column

Oliverfeudj commented 2 months ago

I have solved that error now I have this one with ESPRESSO_C.perl, it points out the line but I struggle to understand the mistake here.

  stress1_of_3weeks.bam      0
  [Wed Apr 24 08:41:26 2024] Loading reference
  Worker 0 begins to scan: 
  stress1_of_3weeks.bam
  Worker 0 finished reporting.
  [Wed Apr 24 08:42:43 2024] Re-cluster all reads
  [Wed Apr 24 08:42:43 2024] Loading annotation
  [Wed Apr 24 08:42:59 2024] Summarizing annotated splice junctions for each read group
  stress1_of_3weeks.bam(0)
  Worker 0 begins to scan: 
  stress1_of_3weeks.bam
  Worker 0 finished reporting.
  [Wed Apr 24 08:43:36 2024] ESPRESSO_S finished its work.
  [Wed Apr 24 08:43:38 2024] Loading splice junction info
  Fail to get file size for Stress1_out/0/sam.list3: Bad file descriptor at ESPRESSO_C.pl line 1722.
  [Wed Apr 24 08:43:38 2024] Requesting system to split SAMLIST into 6 pieces
  Fail to get file size for Stress1_out/0/sam.list3.
  Fatal error. Aborted.

Thank you for your help!

Oliverfeudj commented 2 months ago

I noticed that my file Stress1_out/0/sam.list3 is empty and I don't know why

EricKutschera commented 2 months ago

Fail to get file size for Stress1_out/0/sam.list3 would happen if that file is empty. I'm not sure what could cause Bad file descriptor

If you're running with v1.4.0 then espresso_s_summary.txt should have counts of how many reads were filtered for various reasons. Hopefully that can explain the empty output: https://github.com/Xinglab/espresso/blob/v1.4.0/src/ESPRESSO_S.pl#L1489

Oliverfeudj commented 2 months ago

Thank you @EricKutschera for your reply, here is my espresso_s_summary.txt file:

number of chromosomes only in input annotation: 0
number of chromosomes only in input FASTA: 147
number of chromosomes in both annotation and FASTA: 47
number of isoforms in input annotation: 252989
number of splice junctions in input annotation: 402625
number of high confidence splice junctions: 0
total over all splice junctions of supporting reads: 0
total over all splice junctions of perfect reads: 0
number of read groups: 0
number of reads in output: 0
number of chrM alignments filtered: 0
number of secondary alignments filtered: 0
number of alignments filtered for mapping quality: 5718842
number of alignments filtered for a long insertion: 0
number of alignments filtered for unrecognized coordinates: 0
number of reads filtered for missing full sequence: 0

There seems to be something wrong but I don't know what I even changed the input and used instead a sam file but I still get the same error. I am adding a header of my sam file maybe it's the one causing the trouble

Thank you in advance image

EricKutschera commented 2 months ago

It looks like all of the alignments were filtered for mapping quality: number of alignments filtered for mapping quality: 5718842

The three alignments from the screenshot are all unmapped (flag=4) and have mapq=0. Maybe there was an issue when running the aligner or it could be that the reads themselves have some issue