broadinstitute / poolq

The Genetic Perturbation Platform's tool for deconvoluting and quantifying the results of pooled screens
Other
5 stars 1 forks source link

Demultiplexed FASTQ support #23

Closed mtomko closed 1 year ago

tmgreen commented 1 year ago

PQ4 might be a good time to make the switch from Array[Char] to Array[Byte] for all sequence stuff... With COMPACT_STRINGS under the hood it might be a lot faster.

mtomko commented 1 year ago

I don't know about COMPACT_STRINGS. I had just been agonizing over the decision to between String and Array[Char] in PQ4 and had arrived at the conclusion that using Array[Char] was going to be a huge hassle due to how Scala/Java treat arrays (reference equality, no hash codes - to gain those, I'd basically have to re-implement String, I think). I may be able traffick in Array[Byte] and convert to String when it comes time to look up things in the Reference though.

mtomko commented 1 year ago

Another PQ4 thought, I'm currently getting FASTQ records as Array[String] (4 elements in the array, 1 per line). It would be interesting to read inte Array[Array[Byte]] but I'd have to implement a lot of the low level processing on my own then. It might be a worthwhile exercise but it's definitely dipping into I/O stuff that I don't have a lot of experience with.