SchulzLab / Aeron

Alignment, quantification and fusion prediction from long RNA reads
MIT License
10 stars 8 forks source link

FusionFinder: src/FusionFinder.cpp:41: std::__cxx11::string geneFromTranscript(std::__cxx11::string): Assertion `!match.empty()' failed. #10

Open lane-zhao opened 4 years ago

lane-zhao commented 4 years ago

Hi, I used the new version of Aeron, but when I run the step of "rule fusionfinder" in the Snakefile_fusion, there always have the error. The log file is : Fusion finder Branch develop commit f9a9e1703e1abcf99fcf0a3ea699cf41f4d8c0d4 2019-07-09 10:39:49 +0200 load graph load putative fusions load reads load partial assignments FusionFinder: src/FusionFinder.cpp:41: std::cxx11::string geneFromTranscript(std::cxx11::string): Assertion `!match.empty()' failed. Aborted (core dumped).

The format of input file I generated is in the attach file. format.txt

ddurai commented 4 years ago

Hi, Thank you for using the software. Can you please upload your config file. Also, did you first run the quantification step and was it successful

lane-zhao commented 4 years ago

Hi, Thanks for your reply. I first run the quantification step and it was successful and the steps before fusionfinder of Snakefile_fusion are also successful. But when I run the "fusionfinder" step in the Snakefile_fusion. It always have the same error. The config file and whole error log file are in attachment. fusionfinder_stderr_N01_cdna_hg38.txt

config.zip

maickrau commented 4 years ago

Hi,

The software is failing to parse the gene name from the transcript name. Could you please upload the fasta file you used for the reference transcripts?

lane-zhao commented 4 years ago

Hi, The reference transcripts format file is in the attachment. Thanks. cDNA.txt Do you mean there must have a format of the head line of each transcripts? Maybe I need to download reference transcripts in Ensemble.

maickrau commented 4 years ago

Hi,

For now you can fix this either with this sed script sed -i 's/ ENSG/ gene:ENSG/g' input/transcript_file.fa or by search-and-replaceing " ENSG" with " gene:ENSG" in the transcript file. We'll look into more robust parsing.

Br1anChou commented 4 years ago

Hi lane-zhao, maickrau, I also had the same problem in the step3 Fusion-gene detection. The error is,

FusionFinder: src/FusionFinder.cpp:41: std::cxx11::string geneFromTranscript(std::cxx11::string): Assertion `!match.empty()' failed. Command terminated by signal 6

I have downloaded the transcript file from ensembl.org. The format is,

ENST00000557168.1 cdna chromosome:GRCh38:14:22168429:22168988:1 gene:ENSG00000259092.1 gene_biotype:TR_V_gene transcript_biotype:TR_V_gene gene_symbol:TRAV30 description:T cell receptor alpha variable 30 [Source:HGNC Symbol;Acc:HGNC:12129] ATGGAGACTCTCCTGAAAGTGCTTTCAGGCACCTTGTTGTGGCAGTTGACCTGGGTGAGA AGCCAACAACCAGTGCAGAGTCCTCAAGCCGTGATCCTCCGAGAAGGGGAAGATGCTGTC

Please tell me if you have any good suggestions to solve this problem. I will be grateful for your help!

lane-zhao commented 4 years ago

Hi Br1anChou , You shoule run the pipleline again form the first step by using the transcript file from ensembl.org.

Br1anChou commented 4 years ago

Hi @maickrau I tried to download the transcript file and gtf file from ensemble, and reran all commands. But it didn't seem to work. I always got the error,

**Error in rule fusionfinder: jobid: 8 output: fusiontmp/unfiltered_fusions_ccs_hg38cdna_hg38.txt, fusiontmp/unfiltered_corrected_ccs_hg38cdna_hg38.txt log: fusiontmp/fusionfinder_stderr_ccs_hg38cdna_hg38.txt, fusiontmp/fusionfinder_stdout_ccs_hg38cdna_hg38.txt

RuleException: CalledProcessError in line 98 of /software/Aeron/Snakefile_fusion: Command ' set -euo pipefail; /usr/bin/time -v Binaries/FusionFinder input/hg38.gfa fusiontmp/loose_gene_fusion_ccs_hg38cdna_hg38.txt fusiontmp/exactmatrix_ccs_hg38cdna_hg38.txt output/aln_hg38cdna_hg38_full_length.gam input/ccs.fq 1 1.0 1 1 3 fusiontmp/unfiltered_fusions_ccs_hg38cdna_hg38.txt fusiontmp/unfiltered_corrected_ccs_hg38cdna_hg38.txt 1> fusiontmp/fusionfinder_stdout_ccs_hg38cdna_hg38.txt 2> fusiontmp/fusionfinder_stderr_ccs_hg38cdna_hg38.txt ' returned non-zero exit status 6. File "/software/Aeron/Snakefile_fusion", line 98, in __rule_fusionfinder File "/software/python3.6/lib/python3.6/concurrent/futures/thread.py", line 56, in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /software/Aeron/.snakemake/log/2020-03-29T031559.394489.snakemake.log**

And I checked fusiontmp/fusionfinder_stderr_ccs_hg38cdna_hg38.txt, I still got the same erros. FusionFinder: src/FusionFinder.cpp:41: std::cxx11::string geneFromTranscript(std::cxx11::string): Assertion `!match.empty()' failed.

Please help me, and I hope this software will work successfully as soon as possible.

defendant602 commented 3 years ago

I got the familiar error because of header line format in the transcript.fasta file. It looks like the format of header line must be ">ENST00000492598.1 gene:ENSG00000117122.14".