Closed Winshipe closed 1 year ago
Hi Eamon,
Thanks for posting this interesting problem. So that I can troubleshoot, would you letting me know 1) the command that you ran to generate this file, 2) the version of inStrain that you're running, and 3) attaching the input genes files, the output gene_info file, and the log file.
Thanks again, Matt
Hi Matt,
I inherited this project from someone else and unfortunately it seems like the original log files have been lost. I haven't seen this issue replicated on the re-run analysis (still using IS v1.5.7) and I haven't been able to replicate this using the test data either (using IS v1.7.1). My pet theory is that it could be due to improper filtering of the bam files??
Thanks and sorry for the bother, Eamon
OK interesting. I'm glad it's not being too much of a problem anymore, but please reach out if you hit the problem again.
Best, Matt
Hi Matt,
I'm getting an indexing error in the IS_gene_info.tsv file. It seems that after certain genes are removed from the analysis, there can be a mismatch between the gene name and its coordinates. In the example below we see that gene 40 in the gene_info file assumes the coordinates of gene 39 in the FNA after genes 36-38 in the FNA are excluded. I've seen this issue repeated in different contigs and in different gene_info files.
What conditions lead to genes being excluded from the file? There are some in the test data that are excluded (eg N5_271_010G1_scaffold_2_26) but they don't seem to suffer from this indexing issue.
Here in the gene info file we have (some numbers truncated for readability):
whereas the FNA file looks like this:
Thanks for your help, Eamon