Open aclum opened 7 years ago
@aclum can you be more specific? fasta.gz fastq.gz?
We already support .fasta.gz
and .dexta
. I doubt anything out there is smaller or faster than .dexta
.
Is TwoBit what you had in mind?
I was just discussing that format with a colleague. To us, that is a broken standard, but not terrible.
There's a 'standard' for storing FASTA files as .2bit files for compression, but I am befuddled as to why they chose T-00, C-01, A-10, G-11. If they chose A-T and C-G to be bitwise complements of each other then certain operations become much simpler (e.g., you can reverse complement a kmer stored in a 32 or 64 bit value looplessly with bitops) and just makes more sense. I use A-00, C-01, G-10, T-11 which is easy to remember because of order.
I was thinking of .gz, we tried it with the smrtlink code and it didn't work. When was support for this added?
On Wed, Apr 12, 2017 at 2:06 PM, Christopher Dunn notifications@github.com wrote:
Is TwoBit https://genome.ucsc.edu/goldenpath/help/twoBit.html what you had in mind?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PacificBiosciences/FALCON/issues/548#issuecomment-293707759, or mute the thread https://github.com/notifications/unsubscribe-auth/AH0R7_gWGYPANCGSFKzqMfoFBnUXQsi9ks5rvTzggaJpZM4M7reF .
This is a feature request. It would be useful if FALCON supported commonly used compression formats for input files.