RabadanLab / arcasHLA

Fast and accurate in silico inference of HLA genotypes from RNA-seq
GNU General Public License v3.0
117 stars 50 forks source link

add --avg and --std arguments and defaults for kallisto single-end reads #44

Closed alexvpickering closed 4 years ago

alexvpickering commented 4 years ago

Fixes #43

Previous versions of arcasHLA used the read length and standard deviation which is the incorrect behaviour. It should use the fragment length and standard deviation which must be determined experimentally.

The kallisto manual states: "Typical Illumina libraries produce fragment lengths ranging from 180–200 bp but it’s best to determine this from a library quantification with an instrument such as an Agilent Bioanalyzer." and gives an example with "-l 200 -s 20". These values are thus chosen as sane defaults in the absense of experimental info.

This commit should also substantially improve the performance of the slow and memory hungry analyze_reads.