Discard BED12 records with all exon less than kmer length

JY-Zhou / FreePSI

An alignment-free approach to estimating exon-inclusion ratios without a reference transcriptome

GNU General Public License v3.0

10 stars 2 forks source link

I've encountered an issue with the freePSI quant command which causes malformed JSON output when a BED12 entry has only exon with length of less than the kmer (read) length.

You have code that checks the length of exons before pushing them to a list, but it does not handle the case when this routine creates an empty list.

https://github.com/JY-Zhou/FreePSI/blob/fbe32ebade20503f45b8c8c8bed168f720d2ae3e/src-refined/KmerHash.cpp#L334-L342

The effect of this appears to be that serialization of the results to a JSON list creates a malformed list structure:

...
 [
  0,
  0
 ],
 [ <-
,
 [
  0,
  0,
  0
]

In EMAlgorithm.cpp you're writing your list as JSON and iterating over exons and have a gene increment check (line 815) but no check for gene start (line 807).

https://github.com/JY-Zhou/FreePSI/blob/fbe32ebade20503f45b8c8c8bed168f720d2ae3e/src-refined/EMAlgorithm.cpp#L807-L816

You might consider fixing the IO portion of EMAlgorithm::computePSI to create empty lists when necessary.

JY-Zhou / FreePSI

Discard BED12 records with all exon less than kmer length #4