Closed ljwoods2 closed 3 months ago
Thanks. I'm not so sure that this is a bug, exactly. Some of the warning or error messages are a bit overly verbose and in some cases irrelevant and I should deal with that separately. If there are fusions where breakpoints are detected and only supported by the short reads, then you'll find some NA values showing up where the long read support would exist. Perhaps that's the main issue here?
best,
Brian
On Mon, Jun 24, 2024 at 12:23 PM ljwoods2 @.***> wrote:
When running CTAT-lr with short-read data, some of the output columns in fusion_predictions.tsv have empty values. I believe this might be a result of an error in FusionInspector which isn't handled correctly, since the number of rows with empty data is the same number of times the following error was thrown by FusionInspector during CTAT-lr's run (though this could be coincidence):
[939/1050 = 89.4 % done] Error - no gene spans 100M bases in length.... likely problem at /usr/local/bin/FusionInspector/util/fusion_pair_to_mini_genome_join.pl line 669. main::get_gene_span_info("chr8\x{9}ENSEMBL\x{9}exon\x{9}13160178\x{9}13160279\x{9}.\x{9}+\x{9}.\x{9}gene_id \"Y_RNA^ENSG"...) called at /usr/local/bin/FusionInspector/util/fusion_pair_to_mini_genome_join.pl line 436 main::get_gene_contig_gtf("chr8\x{9}ENSEMBL\x{9}exon\x{9}13160178\x{9}13160279\x{9}.\x{9}+\x{9}.\x{9}gene_id \"Y_RNA^ENSG"..., "/home/tgenref/homo_sapiens/grch38_hg38/hg38_tempe/gene_model/"...) called at /usr/local/bin/FusionInspector/util/fusion_pair_to_mini_genome_join.pl line 230 eval {...} called at /usr/local/bin/FusionInspector/util/fusion_pair_to_mini_genome_join.pl line 226
I attached a an excel sheet (so gh will accept it) with the output rows from fusion_predictions.tsv (with data stripped) to show the empty values. In the tsv, these empty values simply show up as two tabs in a row.
Here's the CTAT-lr arguments that were run (also with information stripped, sorry):
ctat-LR-fusion \ --CPU 10 \ --genome_lib_dir
\ -T " / .fastq.gz" \ --left_fq " / .fastq.gz" \ --right_fq " / .fastq.gz" \ --output \ --vis This isn't a breaking issue since when loading the tsv into a dataframe you can just filter out rows for which there are empty values in these columns, but I just wanted to make you guys aware in case this wasn't on your radar! I can also try to provide a more detailed minimally reproducible example if that would help, maybe using testfiles in your repo if possible? CTAT-LR issue.xlsx https://github.com/user-attachments/files/15958554/CTAT-LR.issue.xlsx
— Reply to this email directly, view it on GitHub https://github.com/TrinityCTAT/CTAT-LR-fusion/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX6STYNAQVFMZKDZHMDZJBBWTAVCNFSM6AAAAABJ2EX6PWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3TANRUGEZTMNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas
My mistake, I think I misread the docs. These must be alternative splicing events for which only short read evidence exists. I'll go ahead and close.
When running CTAT-lr with short-read data, some of the output columns in
fusion_predictions.tsv
have empty values. I believe this might be a result of an error in FusionInspector which isn't handled correctly, since the number of rows with empty data is the same number of times the following error was thrown by FusionInspector during CTAT-lr's run (though this could be coincidence):I attached a an excel sheet (so gh will accept it) with the output rows from
fusion_predictions.tsv
(with data stripped) to show the empty values. In the tsv, these empty values simply show up as two tabs in a row.Here's the CTAT-lr arguments that were run (also with information stripped, sorry):
This isn't a breaking issue since when loading the tsv into a dataframe you can just filter out rows for which there are empty values in these columns, but I just wanted to make you guys aware in case this wasn't on your radar! I can also try to provide a more detailed minimally reproducible example if that would help, maybe using testfiles in your repo if possible? CTAT-LR issue.xlsx