Xinglab / espresso

Other
53 stars 4 forks source link

novel isoform not detected and all transcript have only one exon #64

Closed bingqiWu closed 1 month ago

bingqiWu commented 1 month ago

I used 22 samples' bam file to run ESPRESSO. there is the samples.tsv

/path/nonpss_2_YYF.sorted.bam nonpss_YYF /path/nonpss_1_WJC.sorted.bam nonpss_WJC ....... /path/pss_4_QSS.sorted.bam pss_4_QSS the command of ESPRESSO_S: perl ESPRESSO_S.pl -A /path/database/hg38.gtf -L samples.tsv -F /path/database/hg38.fa -T 8

the command of ESPRESSO_C: perl ESPRESSO_C.pl -I ./ -F /path/database/hg38.fa -X 0 -T 8

the command of ESPRESSO_Q: perl ESPRESSO_Q.pl -A /path/database/hg38.gtf -L samples.tsv.updated -V samples_compatible_isoform.tsv -T 4

each command reported "finish this work."

However, the samples_N2_R0_updated.gtf file only has annotated_isoform and each transcript includes one exon. like: chr13 annotated_isoform transcript 94713636 94715317 . + . transcript_id "ENST00000688515.1"; gene_id "ENSG00000227640.4"; chr13 annotated_isoform exon 94713636 94715317 . + . transcript_id "ENST00000688515.1"; gene_id "ENSG00000227640.4"; exon_number "1"; chr13 annotated_isoform transcript 44141155 44142017 . - . transcript_id "ENST00000690363.1"; gene_id "ENSG00000226519.3"; chr13 annotated_isoform exon 44141155 44142017 . - . transcript_id "ENST00000690363.1"; gene_id "ENSG00000226519.3"; exon_number "1";

In the samples_compatible_isoform.tsv file, all reads was classified "NCD" type. like: 91ba97b0-cd2b-49b2-b0b9-3ec13b8e1f2a pss_2_XYM NCD NA 91ba97b0-cd2b-49b2-b0b9-3ec13b8e1f2a pss_4_FXF NCD NA a53aa5cf-d726-48f6-a69a-167b886c4bb1 pss_2_XYM NCD NA 8a6279cd-cb7c-5d7d-89f7-52826244cc83 nonpss_CMM NCD NA 5159710e-bc3c-5d78-8426-f35a104ed578 pss_2_XYM NCD NA

Before that, I ran the ESPRESSO with partial samples and same command, but as the SAM files. Nothing like that happened. It still has "novel isoform" and "annotated isoform", and some transcripts has more than one exon. image

bingqiWu commented 1 month ago

espresso_q_summary.txt image

espresso_q_summary.txt of partial samples, but as the SAM files. image

EricKutschera commented 1 month ago

ESPRESSO should have no problem with bam files. The only difference is that it uses samtools view to read the alignments: https://github.com/Xinglab/espresso/blob/v1.4.0/src/ESPRESSO_S.pl#L756

The Q summary file from the run with SAM files shows FSM reads so I would expect some of the reads from the BAM file to have junctions that match exactly with the gtf file. Those reads should be easy for ESPRESSO to classify

Do those bam files just have more reads than the sam files from the successful run, or is there some other difference? ESPRESSO classifies a read as NCD when there is a splice junction in the read that ESPRESSO couldn't resolve. If all the reads are NCD then it could be that the reference gtf and fasta given to ESPRESSO don't match up with the reference files used to create those bam files

bingqiWu commented 1 month ago

Thanks for your reply. I ran ESPRESSO through a sample SAM file and BAM file in the same time in order to compare the difference. I noticed some errors before running ESPRESSO_C.pl which may cause the abnormal result. And both processes report the same error. image BLAST Database error: Error pre-fetching sequence data Failed to run "blastn -task blastn -db /p300s/gaoyuan_group/wubingqi/data/gland_5th_bef/espresso_ZLH//0/blast_0//current_db -query /p300s/gaoyuan_group/wubingqi/data/gland_5th_bef/espresso_ZLH//0/blast_0//read_group_1.fa.blast.tmp -word_size 4 -reward 5 -penalty -4 -gapopen 8 -gapextend 6 -num_threads 1 -evalue 10 -dust no -soft_masking false -outfmt "6 std btop" >> /p300s/gaoyuan_group/wubingqi/data/gland_5th_bef/espresso_ZLH//0/blast_0//read_SJ_group_1.blast". Exit code is 512 at /p300s/gaoyuan_group/wubingqi/software/espresso_v_1_4_0/src/ESPRESSO_C.pl line 2015, <$blast_sj_in_handle> line 4.

EricKutschera commented 1 month ago

This is the line for that error: https://github.com/Xinglab/espresso/blob/v1.4.0/src/ESPRESSO_C.pl#L2015

The code runs blastn right after running makeblastdb. Based on a search of the error message (BLAST Database error: Error pre-fetching sequence data) it seems like one possibility is that makeblastdb and blastn are not from the same version of blast: https://github.com/tseemann/abricate/issues/164#issuecomment-814218943

Can you check these commands to see if the blast commands are from the same install path and same version which makeblastdb, which blastn, makeblastdb -version, blastn -version

Were there any other errors before that Error pre-fetching sequence data? It could be that makeblastdb had an error which is why blastn can't read the database file

bingqiWu commented 1 month ago

I confirmed the above problem. makeblastdb command is from conda, but blastn command is from environment variable. I changed the blastn path, then it works. THANK YOU :D