JY-Zhou / FreePSI

An alignment-free approach to estimating exon-inclusion ratios without a reference transcriptome
GNU General Public License v3.0
10 stars 2 forks source link

Discard BED12 records with all exon less than kmer length #4

Open mdshw5 opened 6 years ago

mdshw5 commented 6 years ago

I've encountered an issue with the freePSI quant command which causes malformed JSON output when a BED12 entry has only exon with length of less than the kmer (read) length.

You have code that checks the length of exons before pushing them to a list, but it does not handle the case when this routine creates an empty list.

https://github.com/JY-Zhou/FreePSI/blob/fbe32ebade20503f45b8c8c8bed168f720d2ae3e/src-refined/KmerHash.cpp#L334-L342

The effect of this appears to be that serialization of the results to a JSON list creates a malformed list structure:

...
 [
  0,
  0
 ],
 [ <-
,
 [
  0,
  0,
  0
]

In EMAlgorithm.cpp you're writing your list as JSON and iterating over exons and have a gene increment check (line 815) but no check for gene start (line 807).

https://github.com/JY-Zhou/FreePSI/blob/fbe32ebade20503f45b8c8c8bed168f720d2ae3e/src-refined/EMAlgorithm.cpp#L807-L816

You might consider fixing the IO portion of EMAlgorithm::computePSI to create empty lists when necessary.

ArashDepp commented 2 years ago

Hii, sorry to write here, I wrote to their repo owner two days ago but have not received a response yet But if you still have the refreshed memories of this topic, do you think strand is properly taken care of in the following snippet from PSI (that you linked above) int exonst = genest + boost::lexical_cast<int>(subst[i]); int exoned = exonst + boost::lexical_cast<int>(sublen[i]);

should not the exonst and exoned assignments change depending on the strand. For negative strands, shouldnt it be genest - subst[i]

Could you please take some time to respond?

Thank you.