Koeng101 / dnadesign

A Go package for designing DNA.
Other
23 stars 0 forks source link

compressdna #51

Closed Koeng101 closed 6 months ago

Koeng101 commented 8 months ago

This PR adds the compressdna algorithm, as well as the ability to base58 encode byte arrays.

Koeng101 commented 8 months ago

Need to add more examples and such, but I've added a blowq, which implements heavy lossless compression for fastq files.

Koeng101 commented 8 months ago

@CamelCaseCam the addition here of a 5char alphabet is to compress loselessly fastq files from SRA, which includes N nucleotides, so that we could go on the leaderboard here - https://github.com/godotgildor/fastq_compression_comparison

(also, this is pretty crap code right now, still in heavy iteration mode)

CamelCaseCam commented 8 months ago

Okay I've got a way better way to do this. We can use Huffman coding to encode the qualities in the FASTq file. Each nucleotide will be encoded as the two-bit ID plus N bits for the code for the quality score. This would be better, as it would encode more common quality values with fewer bits and account for the fact that many sequencers use fewer than 8 bits to represent quality. Once we pack the bits, we should get a very compact file

CamelCaseCam commented 8 months ago

Another idea: dynamically use run-length encoding. Encode an "extra" quality character to represent toggling run-length encoding. Insert this character whenever switching it on/off outweighs the cost of the character (so if it takes up 4 bits, for example, switch when at least 8 bits are saved by doing so)

Koeng101 commented 8 months ago

We can use Huffman coding to encode the qualities in the FASTq file.

Love it! This could really reduce the size. The other thing I was thinking about compressing common things into flags (streaming reads is extremely important). For example, here are a few reads from the SRA benchmark:

@SRR2962693.1 1 length=252
TAGGTAACTGGCTATATGAACTTGTAGAAGGTGCTCATTCCAGTCCTCTTGTTCCCAGAGGCTGTGGCTCAAGGCAGCTCTCATGGGTATATTCAAATTGATTGGAGATGCCACTGGAGAGGGTATNAAANNNCCTGGGCTCCTACAGGAACAATGACACTGNCNNNNNNNNNNNNNNNNGCAGCAGCTACAAGATACCCTCTNNNNNNNNANNNNCANNCAATNTGAATATACCCATGAGAGCTGCCTNNN
+SRR2962693.1 1 length=252
BBBCCGGGGGGGGGGGGFGGGGGGGCEGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGG@FEGGGGGGGGGGGGFGBDGGGGGGGGGEEDGGGGGGGGCEB>@GG>0=DFGGGGGCFBG>EDGFCC0#<<?###;@@>FGGGGGGGGGGGGGGGGGGGGGGGG#=################===CGEGGGGGGGGFGGGGGGGG########0####::##:;;F#=:F=GGGGGGGGGG=EEG<E=GGG###
@SRR2962693.2 2 length=252
GTTTTTCCAACAGAACCAGACAGGTTTCTCCTGAAACTCTTTCATTATACGCCATGTACTGTTCATATCCTCATACATCTGCTTTGATCTTCCCCCTCCCCGCTCTCTCTCTCTAACACACACATAAGAAAATNAANTGTGTGTGTGTGTGTGTGAGAGAGAGAGAGAGAGAGAGACAGAGACAGAGAGAGAGAGACAGGGAGAGGGTATATCAAGTATGAGAAGGAACAATGTGTGTATGTGTGTGTGAGA
+SRR2962693.2 2 length=252
BBC?@;DC1=101EE>CDGG0>FGFFDFGE1EDGGFBFFGGGGGGGGGGGEFGC/F@FEGCGG11=1:1>:C1<1=EFCGFEGGE11?1=11010/:C//=/8:/0000<080000000C...880<AB00;1#00#>>0=FDDGGG@>EFGGGGFCBG0<:>0<DG0>FGFGGDECFDFCG1EFFGGGFDEG=CEB0>0=:=/=C/EFG0C00;<000<00088;;C0@00C0<00<088<FFGGG.6CG/
@SRR2962693.3 3 length=252
AAACTGACTGGTAAAATACAACCCTATTGTTTCCTTTTCATTTTCAGAACAAGCAATCAATATAATTTGGCTCCAATTACAGCTAAAGCAAAAGTGGTTATTGAACTGCTTTTATCGGTCTCGGGGAGGATTGTAAATGGTTCTTTCAAACATTGATTTGTTATGCTAAACACAAACACTTACCCCGAGACCGATAAAAGCAGTTCAATAACCACTTTTGCTTTAGCTGTAATTGTAGCCAAATTATATTGA

The length=252 we could probably encode as a single bit in a flag (just write the length of the DNA). Seeing that the comment is the same as the description, we could possibly encode that as yet another flag.

Here is some data from my nanopore reads (which is mainly what I want to compress):

@44f8a8a1-fa6a-4fd8-8fde-52b19d3b5b74 runid=bb4427242f6da39e67293199a11c6c4b6ab2b141 read=9583 ch=15 start_time=2023-12-29T11:11:49.719061-08:00 flow_cell_id=AQY258 protocol_group_id=nseq28 sample_id=build3-build3gg-u11 barcode=barcode01 barcode_alias=barcode01 parent_read_id=44f8a8a1-fa6a-4fd8-8fde-52b19d3b5b74 basecall_model_version_id=dna_r10.4.1_e8.2_400bps_sup@v4.2.0
ATTGTTTTGTCTCTTCGTTCAGTTGCGTATTGCTAAGGTTAAAAGGATTGAATTTCCCGCGGGTGACACCGACACCTCGAGTGTAGGCACCATCTATGAAGACAGAGCGCTCAAGTTAGATCTTGGTACGGACGGTACGATGGTGCAAATCTTCGGGTGCTCTTTGCTGCCTTGCAT
+
%&&&&'(%$%%''()')***0005*****8:98965456::;;82321.&%%%'*')+&'%%%%&&&&'6+++**,3*)&&&()(***/9:77798<<==DGDCBCBBBAB;,***,&&&&&&'%%'('(().55555/.///DDEEAAABA?54444GSGDCE878DDF<?4)(*(
@93a5eaf2-fbcc-4a0b-af75-fb94efff92a6 runid=bb4427242f6da39e67293199a11c6c4b6ab2b141 read=45340 ch=61 start_time=2023-12-29T16:55:57.719061-08:00 flow_cell_id=AQY258 protocol_group_id=nseq28 sample_id=build3-build3gg-u11 barcode=barcode01 barcode_alias=barcode01 parent_read_id=93a5eaf2-fbcc-4a0b-af75-fb94efff92a6 basecall_model_version_id=dna_r10.4.1_e8.2_400bps_sup@v4.2.0
GGAAGACGGAGGCGCTGGTTGACGGAAGCTGTTACGGACAGTACGAATCCGTGGCTGCAGGCGAAAGTGCTAAGACCATATTGAATGACATCATACATGGAGGTAAACAGCAGAAGACAAACTTTCATCTGACGTTCAGAAAACGTCTTCGCTACCTGGTCGTTATTATGGAACCTGCAGAAGGCGGGAGACTCGTGTTCACCTTCACGATAG
+
%'-.:>=1////...5,,,,,//../))((+))*)*//0''&&'(('&&'')'''(+***)((*00186200..//66610,)(((&%%$$%%''((()--1;;/...---(''&%&&%'%%(**%'''*788<;;99:;:CBC@@;;;;;BC>=>>>FG?;;;9788852**+*)(%%%%**,2))')+*)(%&&'(*)))***&&&''*)*

Those reads can actually get very large. For example, I have a test file with this read:

@755856b8-554d-469e-a2d6-adab456670b7 runid=5d628371d05d61a22a9cb63bfa80268cf529c2b4 read=1022 ch=174 start_time=2023-06-27T17:53:12.149124-04:00 flow_cell_id=FAW55693 protocol_group_id=230627_1 sample_id=3405_2 parent_read_id=755856b8-554d-469e-a2d6-adab456670b7 basecall_model_version_id=dna_r10.4.1_e8.2_5khz_400bps_sup@v4.2.0
ATGTTACTGTTACTTCGTTCAGTTACGTATTGCTAGTCAGCGATTTTCAGCAGATGGAATTATTGGCGAAGCAGTGCCGTACACACTGGTACAGTGTTCTTATCGCCTCTATTGCGGCGTATGTTCATCGTATGACTCGCTCGAAGGAAGTGGTGCTTGGTGTGCCACTCATGGGGCGGTTAGGCTCGTGGCAATTCAAACGCCTGCAATGCGAGTGAATATTCTGCCAGTACGAGTCAGCTTTGATGAGGGGCTGGATATCAAAACCTTGGTCAAGCAGGTCAACCAAGAGTTCTCTTCTGTACGTCGTCATCAAGGCTACCGTTATGAAGAGTTACACCGAGAACTTAATCTGGTCAAAAGCAACCGTAACCTGTTTGGTCCGCTGGTGAACATTATGCCGTTTGAGTATGAGCATAAGTTTGGAGAGCTAAATTCTAAGGCACATAACCTTTCAGCGGGGCCCGTGGATGACATTTCTCTGTACTGTTATGATCTTGGCGGCGAGTTGCATGTGGATATGGATGCCAACCCCGAGCTTTATTCAGAGCGTGAAATTCAAGAACACCAGCAGCGTCTGTTCCATTTTATGAGTGACTTTTTTGCGGCTGCTCGTGAGCAGGGTGCGAGTCAGTGTAAGATCAGTGATGTCAGCATTTTGCTTGCTGGTGAACGAGAAAAAGTGATCAATACCTGGAACGATACCGCTCATTCTGTTCCAGAAACTTCCTTATCAGCCTTGATGGCAAGACAACGAATCTTGACGCCTCATGCTCCTGCACTGATTTTTCGAGAATCAAACACTCACCTATGAGCAGTTAAGTCGAAAGGTCTACTCTTTGATGAACTGGCTGTTTGCCCAAGGTGTGGAAGCAGGAGATCGCATTGCTGTTTGTGTACCGCGAAGTGAAGAGTTAATCGTTGTACAACAAGCCATTCTCGCTGCTGGTGCGGTGTATGTGCCGATTGACCCGGACTACCCAGAAGGTCGCATTCATTACATGCTGGAATCTTCTGCACCTAAGCTCGTGTTCTCGACTTCGGCATTACAATCCAAGTTGCCACACGAGTACGAGAGTAAACTTGTCGATGGTGAAACATTTCCTGCTATTTATAAGAATGTAGAGCCGTTACCACCCCAAGTTCATCCAGAACCACATCGCCGGCTTATATGATTTATACCTCCGGCTCGACTGGTAAGCCAAAAGGCGTGGTAGTGAGTCATGATGCTATCGTGAACCGATTGCTATGGATGCACGATCAATATCCGATTGATGCCAACGACCGCGTGTTACAAAAAACACCAGCGGGCTTTGATGTGTCAGTGTGGGAATTCTTCTGGCCAATGATCGTCGGTTCTTGTTTGGTGGTGGCGAAACCAGATGGGCACAAAGATCCGGTTTATCTGCAGGAAATGATCCAAAACCAAAAGATCACCACAATGCATTTCGTTCCATCGATGCTGCAGATTTTTGTGCAACAGGCGGATGCGCAACTATGTCAAAGCTTGCGCCAGGTGTTTTGTAGTGGGGAAGCCTTGCCGGTTGAGTTGGTAAACCAATATTACCAATCCTTCGATGCGCCGCTACACAACTTATATGGCCCAACCGAGGCTGCCGTCGATGTGACTTACTGGCCGAGTGAAGCCAATACGCAAGGCAGCTCGACCCCAATTGGTCGCCCGGTTTGGAATACACAGATCTACATTCTGGATGATGCGCTTAATCCGGTTCCACCGGGCGTGGTTGGTCATCTTTATATTGCGGGGCGTCAGCTTGCGCTCGGCTATCATGGTCAGCCAGAACTGACGGCAGAGCGGTTTATTGACAATCCATTTGGCCCAACAGGAAGCCGAATGTACTTATCGGGCGATCTCGCACGTTGGCGTGAAGATGGGGCGATAAAGAATACTGTGGTCGCAGTGATTTTCAGGTCAAAATCCGCGGCTTTCGTATTGAACTAGAAGAGATCGAAAATGCGTTAGCCAGCCATCCCGATGTGGCACAAGTTGCGGTGTTGGCTCAAGAATACAGTGATGGCGATAAACGTCTGGTGGCTTACGTGACGGCAGAAGATGCGGAACAAGCTATTGATGCCGCCCAACTGCAGAAATATCTCGCAGATCCTCTGCCAGAATACATGGTACCGAGCTATTTTGTCAGCTCGATGCGTTTCCGTTAACGCCAAACGGAAAACTTGACCGTAATGCGCTGCCAAAACCCGATCTATCTGGTCAGGTCGGTACAAAAGGGCCGAGTAATTTGGTCGAAGAACGTTTGTGTAAGCTGTTCTGCCAGTTATTGGAGCTTCCTGCTGTGGGCGTTGAAGATAATTTCTTCGAGCTTGGCGGACACTCTCTGTTAGCGGCGCAGCTTATTGCGCATGTAAAAGAGATCATGGGCATTGAGCTATCACTGGCCGCGGTGTTTGAGTCTCCGACCGTTGCTGGCATTGCAGCCAAACTAAATGGTAGCGAAAGTGATGAAGCGCTTAATATGCTATTGCCACTGCGTAAACGCGAGGGTAAAGCCGCCATATTCTGTGTTCACCCTGCAGTATAAGCTGGTGTTATGCGGCGCTGACGCCAATTATTCCATCTAACATTCCTCTATATGGTGTGCAGGCTCGTAACCTTGGCGATCCTTGCTTGGCTTTGCCGAAAACGATGAAGGAGATGGCAGAAGATTATGTCGCTGCTATCCGTGAAGAACAGCCTTTCGGTCCTTACCATCTACTTGGCTGGTCTATTGGTGGCATGATTGCGCACTTGATGGCGGGTATCTTGCAGCAGCAAGGACAAGAAGTCGGTTTACTTACTTTGCTTGATTCCTACCCTACAGAGCAGTGGCAGACCATGAATCCACCTGGTGAAGAGCAAGCTCTGGGGGCATTGATTCGAATGGCGGGCGTTGAGTTTGATGAATCAGCGCATAGCAGCATTACTAAGCCAGAAGTTATCGACATCTTACAAGATGCGGGATCGTCGATGGCTCACCTGAGCTCCGAGACCATCTCTGCGATGATCGAAGTGGTGATCAACAACAACCATCGCGTACGAGATTCGGTAGATTATCGCTATCAGGGTGATATGTTGTTCTTCAATGCTGAGAAGCCACCAGAAGAGTCATTCTTGGATCGCAATGGTTGGTTTAATTACATGGATGGTGAAATCAATGTCGTGGATGTTGATTGTATTCACCGAGATATGATGCGACCAGACATGTTACGTCTTATTGAAACAGAAATAGTTTGAAGAACTGAGAGCGAAATTTGATGCTTGTTTAATTCTGGTCATCCGTTGTTACCAGAGAATAAGTGGGAGGGGAGCATTTCCCCTCTGACTTTCTTCAAAGGCAAACTGGAGTCTGGCTCTGTTTTGGCTATTGGTGAAGCGCATTGGTATGCAGACCTTTTTGAGCAAATGACCCAGGTTTTGCTCTCTGAGGCGTTGGATGGTGTTTTTACTCATCTGTTTGTCGAGTTTCGTCATGCAAGCATCAAAAAATGTTGGATGATTATCTGTCAGGAGGTGACATTTCTGATGATGAGCTGGCTGCAGTCTGGCTAGATTCCATCGCTTTTCCTGCCTGGATGCACCCTTGCTATGGCGAGTTTTTCCGACGCTTAAAGCAAGCAAACGCACAGCGTAACGCTCCTATCAAAGTTGTACTGACCGAGCCTCCGTTCGATTGGCAAGAACCCGTCATCCGTCACAGCTTGCCCAGCTTAATGCAGAGCGAGACCAGGCTCTGGTTGAAGGTATTGAAGTCATCAAGCGCCAAAATGACAAAGGTGTGGTCGTGCTCGTCGGCGCAAGGCATATTCTCAAGCGTTCTCCTACATTTCGCTCAACTCGCCATCATCCTTTTGGGGATTGTGGCACCCGCACAGAGATTTGGTGTCCAATACGTTTCGGTATGGCCACATATGTTGCCGACAACGCTTGAACAAGACTCGCTGGAATTGGGTATCTATTTCACCCAACAGCCAAGGTTTAGCCAGATGAACTTTGCCCGATCTGGTTCCTAACAAACCTACGGTGAATCCTTACTCTAGACACCCGTTGATCAGTTCGTGAAGTGTTTCTGGTACTTCAGCACCAAACCCGGCAGTTGAGTGCGGCTGGCGTTGAAATTCCTCAGAAGTGGAAAGCGCAGCTAAAGAAGAGGCTACCGCTTGTGAATCAGCGCCAGAGAGTGGTAATTCAAAAGGTGATTGAATAGGGACTCTTGGTAATTTATCTGGTTAAAAAAGCACAGCGAGAAAACGGTTTGTTCGGTAAAGCAATAATAACCGTGAAACGTTAAATATAAGGGCAGGGCGGATTGGAGATCACTTCGGTTATCTCCAATCAACTCAAACTGAAACTGCCCGCTGTTTTCCGTCTCGGGGTGCAGCGCAATTTCGGCATCTAAAAAGTCGAAATATACCCTCACGAAAGGAAACAGACATTTGAATAAACTCTCTTTTGATACTGAAGATAAGAATGTTTCGGATGGCGTCTGATTTCTGTAGGCGATGTTGTTCATTGGGTGTGTAAATCGCCAGGAAGAAGGTGAGCGGCAAGATTTTGATTGTTTTCAACATCAATCCCAATACTTGGCACATCATTTTTAACCGCACACGCCGCCAAACACCAGGGAGTCTGTATGACTAATACTACCGCTAACAGTGTTAGGGAAAACGGGTTCTCTAGATGCGCCAACCAAAATGGGGGATTGATCCGCAAAAGCATTGTCTGAAGTCAGTTTTCTGATTGCGGCACGTGCGGCGTGGCGCCCAGCTCTAAATTCTTCCTGCCGTTTTCGTACTGATTTCTGGATATGTTCCTCTTCTAAAGGTAAGCCTGCTTTGGCGTACATATCTGGTGTGGCTTTCACCATCTAATCAGCGGACTGGTTGTCCTAAAGTGGAAGACGGGTTGGGGAGAAGATTCATCTAACTTAATATTTTTCAGAATTCTTTTTATTCACAGTAAAAACCAACCAAAAAACATAAATGATAATGCAACTGATAATTATTGCTATTTATACAGTTATTTGCTTTTTAAAGTTGGCGAGCCTAATTAATACAAAAAATAGATATAAGGATATGCAGTGAATACTCAAGTGGAAAAAGCCTTCGTTTATGCCTTCCATTTTGGCAGCAGCAGTAGTGACTGCTTTTAGTGGTCAAGCTAATGCAGCGCAGCAGAGTGAAAATCCAGAAATGGAACGCGTGGTTGTTACCGCGTCATTAACTCAACATTCTGAGCTTACTGCTCCTGCATCTGTGTCGGTTATTACTGCGGAAGACATCGCAAAGATGCCAGTGAAAGACATATTCTGAAGCAGTACGTAGTGCTACAGGCGTTAGTGTTTTAAGCAGTGCCGCGTACGGTCGTAACACCATTCGTATTCGTGGTCTTGAGTCAAAGCATACTCTGATCCTGATTAATGGCCGTCGTATCAATTCTCAAGATGCATTAATTCGTGGTAACGACTTTGATCTATCGACGATTCCTTTGACGGCCATTGAACGTATCGAAGTGGTGCGCGGCCCGGTTTCTTCTCTGTACGGCTCAGAGGCGATGGGTGGCGTAGTGAACGTGATTCTAAAAACACCAACAGAAGAGATGGCTGGCTCGCTAGGGCTTGAATATGAAAGTTTGCTAGAAGGTAATGGCGGCGATGGCTGGAAAGGCCATGCTTACGCAAGTGGCGAATTGACGTCAGAACTGGCAGGTACGTCATCGTTGAAAAGCTCGACTCGTGACCCATGGCGCACCGATGCGACGCCTGATTACGATGCTTTGGAGAAAAAAGATACCACAAACGTGTTCGGTGAAATGTCATCACAGACGGCAAAAGCTGATTGCCGATGTGACGTTCGCCGACGAACGAAAAGCCGAGTGGTACGCCCTACGTTCCGGAGACCAAACCAATACTCAGGATTCTACGCGCTGGAATTACGGTTTAACGCATGAAGGTAATTGGTCGGGGTTTGACTCTCAGGCTCGTTTGTATGGCGAAACCATGGATTTGAATGACGGCAGTACCGCGTACACCAATGGCTCAGCGGACGTGGAGTTGCAGAATAATTACGCCGATTTCAAACTGTCTGGTTTATGGCAGAGCAGTACCGCCAGAGTGGGTGTTTGGTGGCGAATTCCGTACCTCAGAACTGACCAACAGTGGCGACATCCTTCTCGGTGATATCGATTACCACCAAGGTGCAGTATTCCTGCAGAGTGAGTTCGATTTGGATAAGCTGGCGCTAACACTGGGTGGTCGTGAAGACTTCCATGAGGTCTACGGCAGTCGCTTCAGCCCACGTGCTTATGCCGTCTACAGTTTTACTGATGAGTTTGTTGCAAAAGCGGGCGGCGGCGGTGGCTCCCATGCTGCGGGTATGATGGAGAGTAGCGACCAAGTTCGTGTGATAAGTTGTGGTAACCGCTGTTGGTTAACCGGTAATGATGATTTGAAACCAGAAGAGTCAGAAAGCTATGAAGCGGGCCTGGCTTACGAGACGGACTCGCTAGGTCTGGGCTTGACCTACTACTACTCTAAGCTGAAGAACAAAATCGAGCGAGACACCAGTACTGCGGTAGGTATGGATGGCACCATGCCAATCATCACTTACCAAAACATTGGCCGAGCAGAAATCAAAGGTATAGAGCTGGAAGCTTGGTATGACATTACTAAGAACATCAATTTGTCAGCGAACTACACATACACGGATGCAGAGGATAAGAGCTCTGGCGAAAAGCTAACAGATACGCCTGAACACTTAGCAAACTTGGATGTGAACTGGCAAGTATTCGACTCGTTGACCACTTTTGCTCGCGTCAACTACATCGGTAAGCAGGTGATCACAAATCTAAGTAGCGAAGATAAAACTGTGGACGGCTACACATTGGTTGGTCTGGGCGTTTCTTATGACCTCCAGCAAGTGAACCTGAAAGCGGGTTTGAACAACATCTTTGACGTTGAGCTGGATGATGAAGACGACTACTACGGTTACTAGGAAGAAAGGCCGCAGCGCATACGTAAGTGCAACTTACTTGTTCTAATTCAGTAGACTTCACATCAATAAAGAACGCCTCCCACTGGAGGCGTTTTTTATGCCGCTTACTTCTAAGTCTCAGGCTCTTCGGGTGTTTCCGCAGGCACTTGTTGGGATGCTTTAACCTGCACGATGCGGTTACTTTTCATCAGCAGTACTTCAAATCGCCAGTTCTGGTACTCGATGATTTCTCCTTGAGCAGGCACACGGTCTATTAACCAAGTGAGGAAGCCGTTTAGGGTCTGAAAACCTCGCTTTCTTCTCCTTCAATGTTGCTCAGCTGAAGTTGTTCGTTCATTTAATGGGATGAGTGCGTCCATCAGCCAACTGCCGTCTTTTTGCTGTTTGGCCCATATGTGCTGAGGCTCCATACCCAGCTCTCCGGCGATTGACTTGAGTAAGTCATAGAAGGTCACTAATCCTTGAACATCACCATATTCGTCGACGATAAATACCATCTCGGTACTGGTTTGTTGCATATAGTTCAGTAAAGGCAGCCCTTTCATCGACTCGGGTACAAACACAGTATTTTCAGACGTTTATTGAGCCGTTCGATCGAGAGCTTGTCATATTCATCCAGCAGCACTTTGGAAGAAATCGTACCGATGATGTTATCCAGGCCGCCTTTGCATATTGGCCAAACCGAGTGCTGGGTTTGTCGAAGGCTTTTTAGGGTTGCGTCAATGGGTAGTGTGGCGTCGAGATAATGAATTTCAGAGCGAGGCGTCATTAACGAGAGTGCAAGGCGATCGTTAAGATGGAGAAGATTTTGGATCATCAATTGTTCCTGAGGCTCAATAGCGCCAGATTCGGAACCTTCGCTGACGATGGCGACGATATCCTCTTCTGTGACGACCTCACTTTTACTCTCATTTTGCCCCAATATCCTGAGCAAGGTATCGGTGGTAAAAGTCAAAAGAAAAACAAATGGTCGGGCTAAGTTGGCTAACCAATGAATTGGGTAAGCCACATTGATCGCAATCATTTCTGCATTATTTTGCGCAATACGTTTGGGAACCAATTCACCTATCACAATGGCGAAGTATGTGATCAACAAAACCACACTAAACGTGGCTACAAAGTTGGCGATCTCTTTTTCTATACCCTGGCTCACTAACCATTGTCCTAAAGGAACGGAGAGTGTGGCTTCGCCAAAGATGCCGCTTAGTAGACCAATTACAGTGATGCCGATTTGGATCGTTGATAAAAACTGTGTCGGATTATTCTTGAGTTCCAAAGCCAGTGTTGCGCGTTTGCTGTTTTCTGCCATCGTTTTTAGACGGCTTGTTTTGGCTGCGACCAATGCAATTTCCGACATCGCAAAGACGCCATTGAGCGTAATTAAACCTATCAATCCTACTAGCAATAAAATATCCATGAAGCCTTCCAGTCATCCACACGTTAGTAGCATAGCGAGATTCCGGCTGAGTAAATGTGAAGTAGCGGATAAATCCCGTTTTCTTGCTAAGCTTATGTTTTATTTAAAGGGTAGAACTTCTCAGTACAGCTAAGCTGATTAATCACTCTGTTACTGATACACTACGCGCCCATTTCCTCCGTTGAGACGTTTCTATGCCATTTTCTAAGCTTGGTTTAAGCGCACCAATCACTGATGCAGTGATGGCGCTAGGTTATGAAAAACCGACTTCTATTCAGCAAAAAGCCATTCCTATTGTACTGCGCGGGAGTAACTTGATCGCAGCAGCGCAAACCGGTACGGGTGAAACAGCCAGCTTTGTGTTACCAATCTTAGAGAAGCTCAGTCAGGGCGAAACGCAGCGTAAAAAACGTGCGCGTGCCATCATTCTTACGCCAACGCGTGAGCTAGCGCTTCAGGTACATCAAAGCATTGAAGCCTACGGTAAGAACCTGCCAATACGCTCAATGGCGATGTTTGGTGGCGTTGAGTACGCACCGCAGAAGCAAGCATTGATTGACGGTGTGGACATTGTAGTCAGTACTCCGGGTCGTCTGATTGACCTGTACGGTCAACGCTCGATTCACTTTGATGAAGTAGAAATGCTGGTCTTAGATGAAGCAGACAAGATGCTGGATATGGGCTTCATCGATGCAATAGATAAAATCGTTGATTGCATGCCAGAGGATGTGCAAAGCCTGCTGTTTTCTGCCACGCTTTCTAACCCTGTTCGTGATCTAGCGAAGAATGCGATTGTCGATCCTGAAGAGATTACGATTGCGAAACATAGTGCTTCTAAATCAAATATTAAGCAGTGGATTACTACTGTCGATAAAGACATGAAATCGTCATTGTTGAGTCACATGCTCAAAGAGAATGACTGGTCACAAGTTCTGATCTTTATCGAAACCAAACACGGCGCTGCGAAATTGGTCAGCCAGCTGGAGAAGCGTGGCATTGTTGCAGAAGCGTTCCACAGTGGGCGTAATCAGCGTGTGCGTCAGGAGCTGATTGAGCAGTTTAAAGCCGGGGAGATCCAGTACCTGGTCGCGACAGGCGTTGCGGCTCGCGGTATCGATATCGATAATCTGCCAGTGGTTATTAACTACGACTTACCTTACCCGGCGGATGAATACGTACACCGAATTGGACGTACAGGTCGTGCTGGTGCGCAAGGTGAAGCTATTTCACTGGTATCAAAGGACGACTTCAAAAACTTATGCATGATTGAAAGCCGATTAGGCCACTTGCTTGAGCGTGTTGAAATTGACGGATTTGCACCAAGAAAACCAGTACCAATTTCTATTCTTAACTATGTGCCAAAGAATAAGCGCAAACCTCAGGAAGACCGTTCTCAATCTGATAAACGTCAGACGAGAGATTGATAGCAGGGTAAATACCTCTTGATAGTAAGCCGCATTTTAATGCGGCTTTTTTTATATTCTTGATCTTTAGATATCTATCTTTTGTTAAGCATAACGATGAAAAAAAACAAATCATGTCATACGCTTAACATTTGTAACCTAAGCTTGGTAAGGCTTTGGTTAATTTTATGTCGTCTGGTTTGCACAGGGGAAAAAATAGGCGCAGACTTTTTCTCTTGCTACTAAATGCATAAAAGAGGCATAAATGAAAAAAGGTGATAAGCAATTCGCTGTTATCGGCCTTGGTCGATTTGGTTTGGCTGTATGTAAAGAGCTGCAAGATGCGGGCTCTCAAGTTTTAGCCGTCGACATTGACGAAGATAAAGTAAGAGAAGCTGCTGGGTTTGTCAGTCAGGCTATCGTAGCAAACTGCACTCATGAAGAGACCGTCGCGGAGCTAAAACTCGACGATTACGATATGGTGATGATTGCGATTGGCACCGATGTGAATGCCAGTATTCTTGCCACCTTAATTGCGAAAGAAGCCGGGGTCAGATCGATTTGGGTCAAAGCCAACGACCGGTTCCAGGCCAGAGTTCTGCAAAAGATTGGTGCTGATCATGTCATCATGCCTGAGCGTGACATGGGGATTCGCGTCGCGCGTAAAATGCTCGACAGAAGAGTGCTAGACTTCCATCCACTAGGCAGCGACTTGGCAATGACCGAGTTTGTGATTGGTTCTCGCTGGATGGGCAAAAAGCTAGGCGAGCTGTCTCTATGTCAGGTTGAAGGTTAAGTTCTTGGCTTTAAACGTGGGCCGGAGATCACTAAAGCGCCCAGCATGGATGTGACCTTAGAAATTGGCGACCTTATGATTGTGGTTGGCCCAGAGGAAAAACTGGCCCGCACATTGAAGTCACTATGATGCATTTTCACCAAAAAGGCTTATTTTACGTTCCAGATAGCCAAAGGACAAAAGAGAAGGGCAGCGAGCCTCGGATTATCCTAATGAGTTTTCTCGGAGTGCTATTGCCCTCTGCGATACTTCTCACGCTTCCCGTCTTTTCTGTCAGCGGATTATCAATCACCGATGCTTTATTTACGGCGACTTCTGCGATCAGTGTAACGGGCCTCGGCGTTGTGGATACCGGTCAGCACTTCACGTTGGCGGGTAAAATTCTCTTAATGTGCCTAATGCAAATTGGTGGATTAGGGCAAATGACTTTGTCTGCTGTGCTGCTCTATTTATTCGGTATGCGGTTGAGTTTACGTCAGCAAGCATTGGCCAAAGAAGCGCTTGGTCAGGATCGGCATGTTAACCTGCGTAATCTGATTAAAAAAATCATGGTATTTGCCCTAGTGGCAAATACCATGATTTTTTTTAATCAGATTACGCAGGTTAACATGCCGATCCTGACCAAGCGCTTCTTTGGCCAATGCTTGCTGACGTAAACTCAACCGCATACCGTAAAAAAGACAGCGCGGCACGAAGTTGGTTCCTAATCCACCGATTGCATTAGGCACCGTCAGAATTTTACCCGCCAACGTGAAGTGCTGACAGGTATCCACAACGCCGAGGCCCGCTGATCGCAGAAGTCGCCGTAAAGACGTTCGGTTGATTGATAATCCGCTGACAAACAGGGAAGCGTGGTCTCGCAGAGGGCAATCGCGCTCCAAGCCCGTAGGATAATCCGAGGCTCGCTCCTTCTCTTCCTCCTTTGGCTATCTGGAACGTAAAATAAGCCTTTTTGGTGAAATACCATTTCGATGTCCAGGCCTCGGCGGACCACAATCATAAGGTCGCCAAGTTTAAGGTCACATCCATGCTGGGCGCTTTAGTGATTCCGCCCACTTTTAAAGCCAAGAACTTGCACACCTTGCTGGACGTCATAAGAGGCTCCGCCTCACTGCCCATCCAGCAAGAACCAATCACAAACTCGGTCATTGCCAAGTCGCTGCCTAGTGGATGGAAGTCTCACGCTTCTGTCGAGCATTTACGCGCGAGACACGAATCCCCATGTCACGCTCAGGTGCGGCATGATCGCCGATCTTTTGCAGAACCTCACCTGGACAAGAATTGGCTTTGACCCAAATCGATCTGACCCCGGCTCTTTTCGTAAGGGGCCCCTCGCATCAGCATCGGTGCCAATCGGCCATCATTCACCATATCGTAATCAGTCTTAGCTCCGCGACGGTCTCTTCATGGTCCCGCAGTTTGCTACGATAGCCTGACTTGAACCCACAGCTTCTCTTACTTTATCTTCGTCAATGTCAACAGCAGCCCCACATTGTGCGGCCCCTTTACATACAGCCGAAATCGACCAAGGCCGAAATAACAGCGAATTGCTTATCACCTTTTTTCATTTATGCCTCTTATGCATTTAGTGGCGAAAAGTCTGCGCCTATTTTTTCCCCCTGCCGAAACAACAGAAAATTAACCAAAGCCTTACCAAGCTTAGGTTACAAATGTTAAGCGTATGACATGATTTGTTTTTTTTCATCGTTATGCTTAACAAAAAGATCATATTCTAAAAATCAAGAATATAAAAAAAGCCGCATTAAAATGCGGCTTACTATCATATTACCCCTGCTATCAATCTCTCGTCTGACGTTTATCAGATTGAGAACGGTCTTCCTGAGGTTTGCGCTTATTCTTTGGCACATAGTTAAGAATAGAAATTGGTACTGGTTTTCTTGGTGCAATCCCGTTCAACGCGCTCAAGCAAGTGGCCTAATCGGCTTTCAATCATGCATAAGTTTTGAAGTCGTCCTTTGATACCAGTGAAATAGCTTCACCTTGCGCACCAGCACGACCTGGCGGTTCAAGGTGTACGTATTCATCCGCCGGGTGTCGTAGTTAATAACCACTGGCAGATTATCGATATCGACCGCGAGCCGCAACACCTGTCGCGACCAGGTACTGGATCTCCCCGCTTTAAACTGCTCAATCAGCTCCTGACGCACACGCTGAGTACGCCCACTGTGGAACGCTTCTGCAACAATGCCACGCTTCTCCAGCTGGCTGACCATTCCAACAGCAAGACGCCGTGTTTGGTTTCGATAAAGATCAGAACTTGTGACCAGTCATTCTCTTTGAGCATGTGACTCAACAATGACGATTTCATGTCTTTATCGACAGTAGTAATCCACTGCTTAATATTTGATTTAGAAGCACTATGTTTCGCAATCGTAATCTCTTCAGGATCGACAATCGCATTCTTCGCTAGATCACAAACAGGTTTAGCGTAGCAGAAAAACAGCAGGCTTTGCACATCCTTGTAGTCAAATCAACGATTTTATCTATTGCATCGATGAAGCCCATATCCAGCATCTTGTCTGCTTCATCTAAGACCAGCATTTCTACTTCATCAAAGTGAATCGAGCGTTGACCGTACAGGTCAATCAGACGACCCGGAGTACTGACTACAATGTCCACACCGTCAATCAATCCACAAACATCGCCATTGAGCGTATTGGCAGTGCCTTACCGTAGGCTTCAATGCTTTGATGTACCTGAAGCGCTAGCTCACGCGTTAGCATGACTGATGGCACGCGCACGTTTTTTACGCTGCGTTTCGCCCTGACTGAGCTTCTCTAAGATTCGTAACGTGCGCTGTTTACCGGTACCGGTTTGCGCTGCTGCGATCAAGTTACTCCCGCGCAGTACAATAGGAATGGCTTTTTGCTGAATAGAAGTCGGTTTTTCATAACCTAGCGCCATCACTGCATCAGTGATTGGTGCGCTTAAACCAAGCTTAGAAAATGGCATAGAATTTCGACGGAGGAAATGGGCGCGTAGTGTATCAGTAACAGAGTGATTTTAGCTGTACTGAGAAGTTTCTACCCTTTAAATAAAACATAAGCTTAGCAAAACGGGATTTATCCGCTCTTCACGTACTCAGCCGGAATCTCGCTATGCTACTAACGTGTAGTAACTCAAAGCTGCATGGATATTTTATTGCTAGTAGGATTGATAGGTTTAATTACGCAAATGGCGTCTTTGCGATGTCGGAAATTGCATTGGTCGCAGCCAAAACAAGCCGTCTAAAAACGATGGCAGAAAACAGCAAACGCGCAACACTGGCTTTGGAACTCGAGTAAGTCCGACACAGTTTATCGACCAATCCGAATCGGCATCACTGTAATTGGTCTACTCAGCGGCACACTTGGCGAAGCCACACTCTCCGTTCCTTTAGGACAATGGTTGAGCCAGGGTCATGAGATTGCCAACTTTGTAGCCACGTTTAGTTGCGGTTTTGTTGATCACATACTTCGCCATTGTGATAGGTGAATTGGTTCCCAAACGTATTGCGCAAATAATGCAGAAATGATTGCAATCAATGTGGCTTACCCAATTCATTCGTTACCAACTTAGCCCGACGACTCACCGATACCTTGCTCAGAATATGGGGCAAAATGATAAAAGTGAGGTCGTCATCGATTCCGCCATCGTCAGCGAAGGTTCGAATCTGGCGCTCTGAGCCTCATTGATAATCCAAAATCTTCTCCCAATTCCTTGCACTCTCGTTATGACACCTCGCTCTGAAATTCATTATCTCGACGCCACACTACCCATTGACGCAACCCTAAAAAGCCTTCGACACCCAGCACTCGGTTTCACCAATGCAAAGGCGGGTACGATTTCTTCCAAAGTGCTGCTGGATGAATATGACAAGCTCTCAATCAAACAGCTTAATAAACGTCTAAAGCAACCGCGTTTTGTACCCAATGCGATGAAAGGGCCTCCTTTACTGAACTATATGCAACAAACCAGTACCGAGATGGTATTTATCATCACGGTGATGTTCAAGGATTAGTAGTATGACTTACTCAAGTGAGAGCTGGGTATGGAGCCTCAGCACATATGGGCCAAAAGACGGCAGCGTAGAAAGTCAAAAACAAACTCCAGCTGAGCAACTGATTGAAGGAGAAGAAAGCGAAGGTTTTCAGACCCTAAACAGTTTCCTCACTTCATTCGACCGTGCCTGCTCAAGGAGAAATCATCGAGTACCAGAACTGAGCAACAGAAGTACTGTGATGAAAAGTAACCCGCATCGTAGGTTAACGAGACCCAACAATTGCCTGCGGAAACACCGAAGCCTGAGAGCATGGGATGCGTTCTTTATTGATGTGAAGTCTACTGAATTAGAACAAGTAAGTTGCACTTACGCATTGCGCTCGGCCCCTTCTGAGTAACCGTAGTCGTCTTCATCATCCAGCTCAACGTCAAAGATGTTGTTGCAAACCCGCTTTCAGGCTCACTTGCTGGAGGTCATAAGAAACGCCCAAGCCAACCAATGTGTAGCCGTCCACAGTTTTATCTTCGCTACTTAGATTTGTGATCGCCTGCTTACCGATGTAGTTGACGCGAGCGAGGTCAACGAGTCGAATACTTGCCAGTTCACATCCAAGTTTGCTAAGTGTTCAGGCGTATCTGTTAGCTTTCGCCAGAGCTCTTATCCTCTGCATCCGTGTATGTGTAGTTCGCTGACAAATTGATGTTTCTTAGTAATGCCATACCAAGCTTCCAGCTCTATACCTTTGATTTCTGCTCGGCCAATGTTTTGGTAAGTGATGTTGGCATGGTGCCATCCATACTGAGAAGAATTTGTTCTTCAGCTTTTAAGAGTAGTAGTAGGTCAAGCCCAGACCTAGCGAGTCCGTCTCAGGGCCCACTTCTGCAGCTCTTCTGGTTTCAAATCGTCGTTACCGACAGCAGCTTATCCGCGCGAACTTCGTCGGCTGCTTCTCCATCATACCCGGAGCACGGAAGCCACCGCCGCCGCCCGCTTCAACGAAGCTGACGGCTACCCAAGCGGCATAAGCACGTGGGCTGAAGAGTACCATAGACCTCATGGAAGTCTTCACGACCACCCAGTGTTAGCGCCAGCTTATCCAAATCGAACTCACTCTGCAGGAATACTGCACCTTGGTGGTAATCGATATCACCAGAAGGAATGTCGCCTGGTCTGAGGTACGGAATTCGCGGCCAAACACCCACTCGTGACTGTCTCGCCATAAACCAGACAGTTTTTGAAATCAGCGTAATTATTCTGCAACTCCACCTGCGTACGGTACTGCCGTCATTCAAATCCATGGTTTCGCCATACAAACGAGCCTGAGACTAAACCCGACCAATTACCTTCATGCGTTAAACCGTAATTCCAGCGCATAATCCTGAGTATTGGTTTGGTCTCCGGAACGAGGGTGCACCCACTCACTTTTCGTTCGTCCTTGGTGTACGTTACATCGGCAATCAGCCTCGTCTGCAGTTTTCGTTCACCAAACACGTTTGTGGTATCTTTTTTCTCCAAAGCATCGTAATCAGGCGTCAGCGTCATGGGTCACGAGTGTTTTCAACGATGATGCTCTCCCAGTTCTGACGTCAATTCGCCACTTGCGTAAGCATGGCCGCTGCCAGCCATCGCCATTACCTTCTAGCAAACTTTCATATTCAAGCCCTAGCGAGCCAGCCATCTCTTCTGTTGGTGTTTTTAGAATCACGTTCACTGCACCACCCATCGCCTCTGAGCCGTACAGAGAAGAAACCGGGCCGCGCACCACTTCGATACTTGGTTCAATGGCCGTCAAAGGAATCGTCGATAGAAGTGGAAACCGTACCACGAATTAATGCATCTTGAATTGATACGACGGCCATTAATCAGAATCAGGTAGTTGCTTGGACTCAAGACCACGAATATCGAATGGGGTACGACACGGCGCCTGCTTAAAACACTAACGCCTGTAGCACTACGTACTGCTTCAGAGAGATGCCGATGTCTTCCGCAGTAATAACCGACACAGATGCAGGAGCAGTAAGCTCAGAATGTTGAGTTAATGACGCGGTAACAACCACGCGTTCCATTTTCTCATTTTCGCTCTCCTGCGCTGCATTAGCTTGACCACTAAAAGCAGTCACTACTGCTGCTGCCAAAATGGAAGGCATAAACGAAGGCTTTTTCACTTATTCGCTGCATATCCTTATATCTATTTTTTGTATTAATTAGGCTCGCCAACTTTAATAAATAACTGTATAAATAGCAATAATTATCAGTTTTACATTTATGTTTTTTGGTTGGTTTTTACTGTGAATAAAAAGAGTTTGAAAAATATTAAGTTAGATGAAATTCTCTCCCCTAACTGCTTCCACTTAGGACAACCAGTCCCGTATTGGGATCGTCAAAGCCACCAAGAAATATACGCAAAGCAGGCTTACCTGAGGGGAAGAACATATCCAGAAATCAGTACGAGAAAACGGCAGGAAAAATTAGAGCTGGGCGCCACGCCGCACGTGCCGCTGCGAAAACTGACTTCAAGTACGAGATCAATCCCCCATTTTGGTTGGCGCGTGTCGAGACCCGTTTTCCCTAACACTGTTAGCGGTAGTATTAGTCATACAGAGCTCCTGGCGGCGTGTGCGGTTAAAAATGATGCAAGTATTGGGATTGATGTTGAAAACAATGCAATTCTGCCGCTCACCTTCTTCCTGCGATTTACACACCAATGAACGACGTCGCCTCTACAAAAAGTAGAAAAAAATGCCATCCCAAACATTCTTATCTTCGTCCTCGAGTTTATTCAAATGTCTGTTTCCTTTCGTGAGGGTATATTTCGACTTTTTAGATGCCGAAATTGCGCCTCACCCCAGACGTCAGTTTAGGTCATTTGGAGATAACCCAAAGTAATCTCCAATCCGCCTGCCAGATTTAACGTTTCACGGTTATTATTGCTTTACCGAACAAACCGTTTTCTGGCTGTGTTTACTCCCCCAAGAGTCCCTATTCAATCACCTTTTGAATTACCACTCTCTGGCGCTGATTCACAAGCGGTAGCCTCTTCTTTAGCTGCGCCTTTTCCAGCGGTTCAACGCCAGCCGCACTCAACTGCCGGTTTGTGGGCCTAAATACCAGTCATCCACCAATCAACGGGTCCTATCAGAGTAAGGATTCACCGTAGGTCTAGGAACCGGAATCGGCAAAGTTCATCTGGCTTGGGCTGTTGGGTGGAATCGATACCCCGATTTCAGCGAGTCTTGTTCAAGCGTTGTCGGCAACATATGGCCGTGGCCGTCCGAAACGTATTGGACACCGATACCTGTGCCACTAACACCGATAATAGCAAGTTGCAGCCAAATGTAGGAGAACGCTTGAGAATGTGTTGCACGACAAGAGCACGACCACACCTTCAGATTTTGGCGCTTGATGACTTCAATACCTTCAACCAGAGCCTGGTCTCGCTCTGCATTAACTAGGGCAGGCTCCTGACGGATGACGAAGTTCTTGCCAATCAAACGGAGGCTCGGTCAGTACAACTTTGATAGGAGTGTTACGCTGTGCGTTTGCTTGCTTTAAGCGTCGGAAAAACTCGCCGGTACATGCAGGCAGGAAAAGCGATGGAATCTAGCCAGACTGCAGCCAGCTCATCATCAGAAATGTCACCTCCTGACAGATAATCATCCAACATTTTTTGATGCTTGCAGCATTACCAAACTCGACAAACAGATGAGTAAACACACCATCCAACGCCTCAGAGCAAAACCTGCGTCATTTGCTCAAAAAGGTCTGCATACCAATGCGCTCCACCGATGGCCACGAACCAGTTCCAGTACCTTTGAAGAAAGTCAGAGGGGAAATGCTCCCCCTCCCCACTTATTCTCTCGTAACTTTGCAGAATTAAAGCGCAAATTTCGCTCTCAGTTCTTCAAGCCATTCAGTTCGATAAGACATATTTCTCAGTAGCATGGGGTATTCTCGGTGAATACGATCAACATCCACGACATTGATTTCACCATCCATGTAATTAAACCAACCCGTGCAATCCCAAGATAGCTCTTCTGGTGGCTTCTCACTGAAGAACGACGTATTCACCCTGATAGCGATAATCTACGAATCTCGTACGCGATGGTTGTTGTTGATCGCCACTTTCCCGGATCCGTCGCAGAGATGGTCTCGGAGCTCAGGGTGAGCCATCAACAATCCCGCATCTTGTGATACTGAGATAACTTCTGGCTTGGCTCCGCTATGCGCTGATTCATGGTCCCCAGAGCTTGCTCTTCACCAGGTGGATTCATGGTCTGCCACACGAAGTAAGTAAACCGACTTCTTGTCCTTGCTGCTGCAAGATACCCGCCATCAAGTGCACAAGATCATGCCACCGAGATAGCCGGCCAAGTGAGATCATAAAGGCTGTTCTTCACGGATAGCAGCGACATAATCTTCTGCCATCTCCTTCATCGTTTTCGGCAAAGCCAAAGAACGAGCCTGCACACCATATAGAGGAATGTTAGATGGAATAATTGGCGTCAGCGCCGCATAAACGTACAGTAACACAGAATATCAGCGGCTTTACCCTCGCGCCATCGCGCGTATTAAGCGCTTCATCACTTTCGCTACCATTTAGTTTGGCTGCAATGCCAGCAACGGTCGGAGTTTTCAAACACCGCGGCCAGTGATAGCTCAATGCCCATAAATCTCTTTTACATGCGCAATAAGCTGCGCCGCTAACAGATGTCCGCCAAGCTCGAAGAAATTATCTTCAACGCCCACAGCAGGAACTCCGCAGCACTTACACAAACGTTCTTCGACCGAATTACTCGGCCTACCGACCTGACCAGATAGATCGGGTTTTGGCAGCGGCTCCGTTTGGCGTTAACGGAAACGCATCGAGCAACACCACTCGGTACTGTAGCTTCTGGCAGAGGATCTGCAAGCGGCCGCGTCGAACTATTCCGCATCTTCTGCCGTCACGTAAGCCACCAGACGTTTATCGCCATCACTGTATTCTTGAGCCAACACCACAAACTTGTGCAGCATCGGGATGGCTGGCTAACGCATTTTCAATCTCTTGTCCGTTCAATACGAAAGCCGCGGATTTTAACCTGAAAATCACTGCGACCACAGTATTCTATCGCCCCATCTTCACGCCGACGTGCGAGATGGCCCAAAGTACATTCGGCTTCCTGTTGGGCCAAATCGATTGTCGATAAACCGCTCTGCCATCAGTTCCTGGCTGACCATGATAGCCGAGCGCAGACCTGACGCCCCGCAAAGATGACCAACCACGCCCGGTGGAACCGGATTAAGCGCATCATGTTAGATTTGTGTATACGAGCCGTTCAATTGGGGTCGAGCTGCGTCTCGCTCGGCCCCACGGTGGGTCATAAGTTGTGTAGCGGCGCATCGAAGGATTGGTAATATTGGTTTACCAACTCAACCGGCAAGGCTTCCCCACTACAAAACACCTGGCGCAAGCTTTGACATAGTTGCGCATCCGCCTGTTGCACAAAAATCTGCAGCATCGATGGAACCCATTGTGGTAATCTTTGAAGAGGTTTTGGATCATTTCCTGCAGGATGAACCGAACCATCTGGTTTCGCCACCACCAACGAGAACCGACAATCATTGGCCAGAAATTCCCACACTGACACATCAAAGCCCGCTGGTGTTTTTTGTAACACGCATGTGGCATCAATCGGATATTAATAGTGCATCCCGTAGCAATCAGTTCCACGATAGCATCATGACTCACTACCACGCCTTTTGGCTTACCAGGAGCCGGAGGTATAAATCATATAAGCCGGCGAATGTGGTTCTGGATGAACTTGGGGTGGTAACGGCTCTACATTCTTATAAATAGCAGGAAATGTTTCACCATCGACAAGTTTACTCTCGTACTCGTGTGGCAACTTGGATTGTAATGCCGAAGTCGAGCACGAGCTTAGGTGCAGAAGTTCCAGCATGTAATGAATGCGACCTACTGGCCATGCCCGGTCATCGGCACATACACCGCACCAGCGGCGGAGTGACTGTTGTACAACGATTAACTCTTCACTTCGCGGTACACAAACAGCAATGCGATCTCCTGCTTCCACACCTTGGGCAAACAGCCAGTTCATGGTAGCCTTTCGTACCCATAGGTAGTGTTTGATTCTCAAAAATCAGTGCAGGAGCATGAGGCGTCAAGATTCGTTGTCTTGCCATCAAGGCTGATAAGGAAGTTTCTGGAACAGAATGAGCGGTATGGTTCCCAGGTATTAATAGCTTCTCGTTCACCAGCAAGCAAAATGCTGACATCGCTGATCTTACACTCGACTCGCACCCTGCTCCACGAGCAGCCGCAGAAAAATGGAACAGACGCTGCTGGTGTTCTTGAATTTCACGCTCTGAATAAAGCTGGGGTTGGCATCCATATCCACATGCAACTCGCACCAATCATAACAGTACAGAGAAATGTCACAGCTGAAAGGTTATGTGCCTTAGAATTTAGCTCTCCAAACTTATGCTCATACTCAAACGAGGCATAATGTTCACCAGCGGACCAAACAGGTTACGGTTGTCTTTGACCAGATTAAGTTCTCGGTGTAACTCTTCATAACGGTAGCCTTGATGACGACGTACAGAAGAGAACTCTTGGTTGACCTGCTTGACCAAGGTTTTGATATCCAGCCCCTCTCAAAGTGGCAGAATATTCACTCGCATTGCAGGCGTTTTGAATTGCCACGGAGCCTAACCGCCCACGCCAAGCACCAGAGCGAGTGATTGCGATATTACACCACGAGATAGAGGCAATAAGAAGCTGCCGGTGCCTGTGTACGGCACTGCTTCGCCGATAATTCCATCTGCTGAAAATCGCTGACTAGCAATACGTAG
+
#$&('%###$%%%'((*4;>GFBCCCBFGGKJMFFDDFFGIIJRbKKMJeJKPJJNRJI{JLIJIOIKIHJJIQJKIKIHLEKHJFHOJ{G{MGKGDJGGGGHIFCBB00000767==AF=>=@?<<EJI{KHLHJHIMMIEAA@AB==={K{HMTJSIG{N{LGHMGHHGFBECGTIJILGEFID3(((1{55556{JGIJ{M{POJGJJIJPK{GIMQO{{OJF{H{@A@@<==>B@=88<AEKMM{IIFIP{GHGKJ{JMVJ{KQ{JKJOHF{H{LSJPIKKJGHGHGI{IJ{K{HIIK{L{N{HJJ{^QK{JKJJNH{P@@@=>;:::6300//045@AKEDCGGBCC55555::<:6;6678JHIJG{KGJQMHJQGJKKKPKNNHKOOLJUH{JILKII{L{LIJ{IHOJKSLLLHK{MIRHJNJNHILLLIGJFIN{PKKJLI{KKOHB9EBBEPC@@@@@N{SHTLKL{J{JIPHM{H{QWKSJIOI{GGQ11NHGQGEGFBABFJDDDDGGN{OG{HHM{ECCDDE?GIHIO{KIMINKKIOMHV{JGHE11111D<<=DJJH{{{{H8767740--HIEAJEEDFFM{I{LNJKLMJK{{K{HDDD===>?KQDMEGJGHH?>>>>CEEFFGIPIIMIIIIGL{{M{ML{MHIJKKJ{LPNLMIJJIIJKHRM{P{{JMFDFE{IEEPI{HMJKHFIUHJFFSO{MJPMKOGZLMN_KKM{RECPK{LJFLPIII{MH{JO{JSGBEDDDEHLMI{HNIIK{G{{H{JHGMIHJGI??41--,+())-.?BBJRGIL{KIK{H{J{GFHMRJNZMKKLM{J{J{K{OMJ{{{{RLIMN{JKPIH{NIOMHD9EFHILKKG{{KFJGBFIJIJNJIGKRJIJ{HIKIIKJJMSIJJMOQ{H{I{ITLHHPL{HNPLJNMJHGHNHIKJKJJMLJOLHNO^LIKHJ{PPL{FIK{MKIIODDNHHFHLI{H{PC=?{{PQO{{GHM{JJJILMN{JNVLJHJHLJ{K{M{QLIM778LEK{PIJK{PHEEHEF9:MMKQJLNI{{IHIFIGMMHKPIJ{LKLNJGG=46;<@CA:9987=GQLM{{JLO{N{UP745<<{{HKIJIKOIO{JI{{JK{IIJHLJP{IMK@BB@@??BFN{JJHNJQSQPKH{666E??DDFLKJGJJKKGR{LLLKJ{JKSHJ{JHLKDGAAAFHO{NQMOSQL{HKJIKJXLHO{GO{{JOKJHIO{JHK{K{JH{MLHR{KPKJLOOKMLKFMJILIP@????GJKGIKKNINH{L{{JKJGKJ{GOFL{KONIJ{S{MXIHMOFLJJDDM{MGIGOGFHGJGCA;;;66IHHJNLRNVOHLMH{{INL{JKGJ{KPGN{M{HLJJ{OOLJIH{{KKK{{J{LKJPK{JML{QKGOKJ{NSH{FE{88888BCIHEIGI{NRJ{JIHKHLIHRHHIJ{H{GGKGH77JIMVHKLJ{NJIIJLLILD@@@GPHRL{OQILLEGNHFHJIHKMF,++++GF?ABGJEILHM{IKKLJK{INMHKJIKHPJ{HOJKIJLISKUHLMMIONKGDEEDFGCCE55555432347>GPFQIIII{GNKGHINHNHNJGHMK{IITHFOKEVX{JJ{ILGGIGNbHK{MHMLTOFEFFJ{HELKJJKKUKO{OHIM{PSPPKHJGIFDCM?>GGNLJLKI???>B>CB?99863))1012HUH{{KIKLJLKOKP{LMOROPJL{HLJHHWHHMLLJKKPEEGEIHGHKOKMS{JOL{NLLQ{RLTUKUH{VI{{JNKNLN{KGJJNKJONJLM{L{{KSIMJMIHJKLNHTH{{PIJJH{ILKJ{JQGJKKJ{JJEK{{IKN{LJH{{MRILHKIMIPJJE=<<<===>>>IHHJKIIJLIIKQKF{IHE?>?>A*)))%%%%'*'''''&&'+5556=>>EFED44344GNIIKLKIT{JGLGJKLJJRGHGFGOC?999:LOL{KNHMJNLKJIJ{NIYHMHUKNH{{J{L{KJIHHLPKNJMHNGO{LNJQIHMLJGKKHL{JL{KMMJ{JMGII{K{L{SJ{{NL{JISKJLMNKP{IJHIKYOFKMJJMN_KM{LMPKNIRLTH{NIKIMKLW{OJIHHMPJ[RLHKOFEHJMFKOEG65557<<?E?>AEDFE{>FAG43333322435...00MIGD@IRMJKD{GEJJJ{M{RILNHIHJMMLMNMKK{IKJKIL{LM{GLM{FDCFEH{GHGBA>>=??@C{IKI{KGILKIKJM{MOPPGECEGKI{NFIKMLKK{QHJQKLK{{LRJPHIKRHLPL{JKMKLJ{I{YSFK{{{VKJIRIRO{LFJ{JNGHJ{KIKHJILMP{{OMLRJJ{MJHKGLNJHHIE{LKHKQ{HJMLK{JKJKKLHIQLKIMKJHKHLQIZMJKILGDDE;KH{KFCECADGEK{JHGJGOIHHBHA=965533367FGGJIIKIHKIMI{KIHV{G{IMMWKHKHLOIKIFQJJHKJ{OJKOLIIHFDH{LP{G{KHL{L{INFHGNJNJ{L{MHMLIGLIIJN{MKKNN{JLFMJIOIIHJJIKJKIMLCG=)&&&&&(8DEEEGHHMJHHRGJHEMG55//-,,--A@A>?@AIKHJNIQHIH{YKKHGFCDIJHGNHKL{HKK{KSIO{IIGHNJ{KIO{98889;IJKK{IJJGGFIOHFEFFDEHJGH{KHLJN{LHGLFKFFFPFIGHDDGDCGFJKG{KHcFFIIGHI{D777CDDCCCDFGGHJGJHGKIGIHJDEBBEDDEFGSHGIIIQIJHKEHHG?FFGX{HGEDDKELGGFEFGGHEEEBABCCECCDGHCEECDHCDDHEFHRLHNEFIHFGEECIHJJZLNLGJN{L{{{{I{MIPKIL99999CJJN{JIKM{GI{K{KJKL{KFLPJ{ZMMVJLROIJJKIHHOKNM{II{JJHPOHJPKKHLEGMF{NI{M{MIJKJYOGJEJLLTIGMOOJKP{KNHOJOOL{LMLKRKJJ{KKJ{LHF{{JOFG{{J{TM{KIJ{KIIIP^{KKJ{J{{{JIJ{K{L{MJ{{MIOSKNNIIJHGJHIIJ{JMFO{LIK{JIOP{JKKJNMIMILHGGDEGDC88889G{MXK{JHK{LPNK{LLLIMHJGIHQI?ADDFI{JNMI{KHHIFHH<778&&&$%%'?JIIHHMEJKLHMQ{QKPIIGKKMIK{MK{JSJVKJWHIM{KPKH\{JGLWKHHTK{JGYJN{JJNKJIJ{[JMJKIHSHKOLFGFEHHEMEFEEA:875)))('''))(&&&&'++01399::::<@@>@??DJFECFF977778><@IDILGGHOIKI{E@????:A<99:FEHONIIKDEEDEL{BEEC?@@@=A@;4=99:HIGEADHDEDHH{HJMIJMH{R{M{GKLO{I{HHIVHDAF3@CCEFGJ]GPIJJJ{{HKGNKGJJIIHGDACC81222?B@/@==<<:<AEBOJGILIKFDEMDL{KJH0000.,,++&&&&&'&&&&,-889@<;/.-++++(()))877445@>@@@@CCGEGDD<;;<<SIGFFEECE{::9::EJFIHMLMH{MJ{JFL{{LOJKKNIKLH{IQOJR{MMNEFCDEIEA@LLHJJJLIL{RGK{H{H{J{QOHED{CBM{HHHMOQNGHBGDHJKUQMJLPL{KJ{QLM{RJLJN{J{MLKFB---,/1,,-(()AEIT{{N{NL{JL{{{{N{LGIJGGI?;99::<</.../4+),,-<C{I{HKF{?>>>?>@DGDGEIFMKIGKAB@FF????AFFDLHHGNEGJHGJNKCD>>>=>?DFEDHC?>;;665,,,.1FGFFCFHK{LJL{MXR{KHJKFFIDFNG{FIIGIQHJJLJMJLFJOQMH{JLMJFGDEEC21112BDFFKFEFJ{MLJIH{FD@76223,,'&&%%&&&'()./044@FRLMHH{M{{GJILJMC{FE>>DCEEBACAB{HFGBDDFDFGJ{{I{JPHJIABBCBEBA<::;;II{JLGBCDCNIKEF=<<20///0--../0///0331('&''''333??AAFA;>>912667LM{MKFJK{JJILJHQJHNJHP{OH{I{{GHF@=)(()<====FG?==,*'''&&)*+-+))))(%&'(*0(%%$%'((''(((=FNIJILEI{HKLIJNHI{FD{{?78**146><7677=ADFGKJPS{RJO{LJOMGOFKFFDLKIOFMEG{OMLV{>=<<=<====BABDFGGFKMOHG{JG{GLG{MKMF{NGILKIIIHINLMNL55554B4,(''()++/--./ABBJFQFMPJOELG{M{FFFHMHJH76667BBCRMK{HUOHFKHGGIFJIJGE@+*))(((((((()6696777787556CCCFEDEDB@BB@FA?96---+,,**)((335>>CEEIGFFSFLILD?B37800DCB;;;;JJ{JEC@@@AAEB6?>@@{I{JKHFGGEB??A@AHHIGE;1DDCCBCMFKI{MKLFKJBC@><@AABIHBA??<?<=70=?KGNGIJTM{HAAAAEDFHA22222GDICHEGFJIP{G{GEBEFIHGMKHJHFGJJFGHKK{ILJFFFED@**)))/()11255KIEJHMH{LGIH{G:::::{HQI{GNKGHK{KK{SJKIJHHI{LKKLMN{JJHJFMK{IQKF{N{NLGIHTHBAABBFI<;4*+-FCFKEDFEE{OIK{HGGKF{H{VK{HK{GG{LGFOF{GFJMHQN{JJVLKNIFFGGFHJIHLLHIIH{LLHIGHEDCAA<<<==A@@DGHQJOELGCKPH{KKIKCABAB{EKIIMMMQJJKFFDCFF43336><>=?==<<AEA@???AIEF>=<78ADDJHIGMFFKKNJ{FPFCJLHMJOJHLHFKF::9//001AGZHLKNKHK{JGHEFGL{HVLG73BCHIMAA??87660--..7-,''''07;@CBB???@EGIDGFC@>;;96555/)'&&''++,.62**''()()/2BBC>????{JKEB<90+)''&'))),766?@FE{HHHGBEEOJJ{J{MLFKR{J{KOIK{GKKGLG{GGGI<+***+?GEFFEJLMJT{NRHLPMHC9000/*))***--5C@@@@@{HKKHKNKGQ{IG{HC4444{MGLK{GH{M{{MNHH{H{KS{@@?A77721224<,((9F{L{I{QT{HOHFII{JJNHJE@M:64444NKIO{FGFFEID@=@;=<<<@6:;;<OHKJHHGKHGK{IOKNFIOOHIOM{HIIIN{KFJ{QVHJK{KIKMKRHHJNHJRTIEIJM{GNHOHIMK{HHFGJI{HFKH{JADBCO{{HJMIJ{JRUIJ{J{JGGN{H{KLECDGDFBBBE===8999FFKM{IIKDGBBCT;<<;;;))))))BBGJM{IG{J{RIFLIXMNJJI{F{{JQGIIKPJI{JQLKIMHFKL{HJIPHIQW{KIOHKJKOMMPHL{LQIKN{{LP{JKISFHEHQNM{{J{JIHDEKIMMKHOMI{MHM{HOHN{M{SOHFKLOJ<<<JJFEFJ{IKIMQF{JIKHFF==<<<@8C@@@880..55LFO{IHHHFKHEGGKCFJJJKILOL{LLHJP{NcGMHKJ6645GFEELJJKKKJJIHFGJJ{EFOULJH{{MG{RTJ{K{{IJKH{MOMUDECDDECD?>>BJ{HFDFBFIGIIJJQUGJ{IILILK???>444555{IHF{IQJIMRMPNLLIJKH{OIIKKKKMK{LJLJGL{KSINMJNULMPKH{JJKFBBCC<<={I{{INL{PKMNGFDE{LIKQHJF66542-(((*+(''''**)**.1))*56<<==={MI9KIEFJOJ{KHMNK{{IKPGJHIH{WKF>@@??B<81.++-18BHKJMGIEMKLJ{NKHLLGJ?????EA?@@?;32.('&%&('%%%)--DH{J{JJHJG@@@AA'''''.&&&&&''''(():<=ABI=,,*)((''....64553333688=BCDBGQLHIFDDEHH{IGVLK{L{JGO{JYJJ{JGWKJHJPKNNLLHNLIHGJIH{{ICKJOG{HGKGFIKJFJ=7+++++7DEE{FGR?<<==GA222115:<AEGMJ{IGLGJWHJO{KT{RHLMULKNU{GLKGKR{{MIJLHHQJN{GO{HKKHJHIAGDKIHJDHGFIGFFIKHHFFI<<<<;BDBDGD?>'''&&))('&&&&('''&&''(-,358?>@<=GKJPFFGECDFCD>>>BBCHFGDFGHIHFLK777////0999111?>>.---.IIHTFIEFGFHIIGGHJIIHOHGKHIHHDCFEEFJNLXA@AABGJLNJCABGDJJK{{H{ILKPMGGC@@<=>GI{L{{GNOFHG{{4334>@{HIKBA@A?AB{{L{22222?@ABA@CGEGKKGDEEDCDDEEE{EEEIJ{JMEJFJHEFGFCGILLTHGAIFIFFIHHKKGF{3222364*)(&'+++**077=CBCDD<<<<GHCBBABDDDGEHDHIGLHKMIGHJFEHF]FITHO{HGGJI{FHHHJLG{FHDDFFJKKGIJEG{GKJDDFEDFCCCDDGHFKGGI{FEFGEEJFGFLMJJJFHFFEFFGIIIILIM{LPHFCDDDEMI{GKJ{KFNGGIFHHEIFIGFKGIFDFHEEEBBCCDHGGLMGIINGEDJJ{<7778LJJEHIGEHFBIQGJFFDIFGFHHDDKEHHPHEKGI{JRFPGIGGFKG{LFRHGM{HHFICEHDGGVGJIGKFDDDCDEFGF{ILIL@@?>>BFECEEDCGC>>>?ABHIJFGIJIIFKB@@GCBCKFGEIEQIJEJKOJIQHGHDDDAB@ACECGJK{JEHFGGFHNEDECDDFEE{HELPK{NIKKKGEFIEEI{KKJFKSIHH{K{JIEGFFEHHNFIIGJ{HGF{IILJJILJ{{NLHIMNLKKNKXQJIM{KMH{IJLGIIIIJLKGFII[KFJ{JKK{HGGFGJGI{JJOKKML{KNLIJIGFI{DEG{IHLHNIIZJ{MKQJJ{H{JH{LGIHKLLGL{JQKLILK{TLPLIKG{H{IJJH{PJR{QNLQJHHMO{MMGKSMMKIKOI{TNLV{N{HHPIIOHLJMIKGKMS{DBC))))))**D@BKLM{JIGIIIHGIM{HJHHKJHFDDDDEFGFGIDD@?::EDCGIJGHJ{LHFKJGKLSLSFKKO{HKIJT{FCCKGNHDBBCHEDI{{IEELIDBA@==?@99CGGHIINMKGKKI{LTJ{JKI{H{JHKOLLJMIJIQIINPL{JNPFJGK?MILIKIH{NFVMGJGGJFHGIKGEKHK{KKLIFFHFFFIJIVLHPJHIHGKHLJ{{PIOHGFHIJI{PIJINMJRIKJ{J{JNJNHLHINGFIIJMFGJOMIIGPJJG{HKSQ{KL{{{P{IHMJLJQF111{FII{KJHJJL{LIKGNKUIKJ<;;;<@'''',''+++++,6;;822GHIJHRGIFGFEDEDDABKED{GEJFKGDGFEFGHJ{{{{JH{MIOFGFEEGEDEGGJGEEEEEEINKGJGHHGGJLKNJGKK[HHEHFG=<<<<ILHHPHKK{HKELFJNKKIEIEEGGGKIGF{KHIXHJFKGEGFDCEA@?>>?BCIGGEIFEJGKMFGFEJGFFEGDDGGIDFHGEFFFHGHCCBB@BBB@DGH@FD;=C=<<=@@00/.+,****'(&&&%$$$%&''()99<<:;9(((((312/***+,CB@FHDEDGDCBCDDII{IRKP{KKHMH{L{JEFDDE64325666888EJOK{{MI{P{KFKQLLJK{LKUMJ{LH{JLKOHOJJWFIFFH{KPHI{N{J{T{IJ{KRIGLKVGGGIHJLRIHIH{ILGVSKGJJ{QFKKGIDIHLHGQ{KQN{{{LMUIJ{_DBA@?AKJ{ILO{KLKKG{N{{GNNGLM{SL{RJILJ{L{{KMH{{TJHIDDFGELKQOMKLIG{FL{BITNMKK{KHLQIJ{LHMIMIDHK{OJ{{PJMIIKJ{JOIOIII{{L{JJ{P{HPIJNPLKMHFM{JM{{OIINJ{OML[{`LLHF{EFE{KK{KPIKRORJUQLSLJRLKIHKK{NIO{HJHJGSEGFL{I{{OKKJL{OKLNK{O{MIIJNHL{I{IMIRMKIIHQL{J{J{NJRIJM{KJ{KFIKW{GGJHJ{PQKLMKH{{J{QK{J{{JH{M{{MIN{{LNMQ{HMKLQOIKKKK{OJNML{KJMMH{JLKILJJHG{{HGL{IMMNF{HUINFRBIFOPI{IIFHE@@FHJK00000K{LMLNIEEEFHM{IP{KIIGJ{{{M{P{KO{LNHKOILMSL{KILT{MPKOJGIPR{KIXIMMJMMH{JIIKJFE{{{FIJOOIMJDJNG{O{NF{GKNKMI{L{J{NKJNOJL\KRKHLLKQTK{EIUHGIH{KILIPHEDHGGIIMH{KLGLJIKG{UK{KI{HKM?-+FKFJOHLJLLJ{{G{{DFEJNIB:::::GJ{FIIJGJ{MKKINL{J{KHJFHHLKMIHLIQGIIJEGHII{NHGIFEJGGLP{HFI{HFJIJNILJFDECEFIJJLLIS{I{LFMI{HM{{PIEDE{JI{JKII{GJ{{FEHLI{GGMFKH{HGWTIKGILNIGJNO{O{MHJ{NJIIIHK{JOQIHQI{JKNFC94H{MGMILJ{IHKIUFFISHQLNHH{J{M{LMSKJKJUGGNJHKJNOL{H{{{IJMK{JKNOLP{MMJHMJNLLW{J[JM{I{KJKLGF4EGE>99889FHIIMHHMLS{MJKJIGJ{WKMKNUDEDDCCCDILIJN{JLVH{{ABBBCJGLIFGIG{JLLEMLMLKGFGFI{HEGIGKJ{LJIGIGJJJHGJLJ{^{HJLSIOKKU{RJHIGLMIGGDDFDMKJGGKGKJMMJCCCCCEDC{MKIKIHKSMIKIKIKHFFD77777888889?==?89BCDFFGFIFKFJNOEJK{L{NGH[OD>EKMJQELEF876444558=>{JJP{M{{KHHJJPRMILJJMH{KJH{JOJ{KLI{MKJGHDDDEELKGDDEHGMG{KJJNIJMHILJIGHSHMZH{{J{JK{J{N{{LH{LN{JJHJ{MHJKKPH{MJJLKLH{LPMHKLQGMO{LKNM{{GKFIJKMKMK{NOLJLMMKJWMLXIIHGI{IGWGMG{OKMJKGHGGFFLHKH{GKNMJPSLJ{KMM{LLI{NHMKGNTJ=<<<>H{{KM{JKJRJLLLKZOPLYK{HSJINPPH{{NKRN{LKHKILKLYMH{KLGL{VMMPOILKJ{FFFGFJTJK{JGL{IGNRLJLL{JTH{KJJFJKJM_KKKKHP{LREEJM{J{KHMKFFCF<II{OHNJ{GGIKMFFIJ^{L{D>=>...--334ADLL{NMJSIGMLKEL{G{K{{JHMTILHGGJOIIJ{DA?>?CGHHFHGBHEFDC:IIINKLPJMHR{{J{J{LI{HSLJKJSHHK{{G{H{JNNMIL{KHOJ{JKKLGKEGHOJMHGJMJMI{{{P{N{JIK{OPJKJOPOQPL{LINMJIDFGIIMH{L{{K{NK{HKJQJIJKL{JKQJIMJHLIGMEDK{OMKHGJEFTKIJ{MJKNQ{LNPKIHIHJM{IPHKIMGHPL{K{TM{ILM{LG{KJV{L{LJJ{IIIH{I{LM{K{H{KIML{KEI{IKIKKFFFLH<MKHI{L{IY{NNI{{SKNKLNHIJ{MSIK{MHFKMM{K{MK{OKHHK{J{L{K{KLWLGLLLHKJ{JJKLFIOLHKHKJ{{J{ITQINKIROLMNMOJIKKNIMNJ{JJIJOHKHMILMOKLJLOMPJJGLTNKcNNJLJKKQJHHHKLJOIILLIHUEHIGGJNMK{KJQM<<<<>EHKL{OQO{NOSPKHIILILGK{M{LIKNLOKPGIKGFKII{KJKJOHPKIJKHJNPGLPNK{IQKHRJJHLIKM{N{JPK{{GP{JHIMIKMSIMKOI{HGJKUJMPL{MK{J{{MMK{KJ{LKKN{LKGDM{HJ{JJOF{HYIGI{GILJJMKLM{MIKPMOJJQT{GFJLLFHLSHIIHG{I{{MNMOYNQXOK{KM{NIM{IK`JIJ{{KLINK{HKIBFHJ{{IMIHKKQM88888888M{HNHLMKKK{JJ{K{IMHOLIHJK{?>>98:9<JJEFIH{LKLM{JMM{L{KHKNNG{KQ{QNKHLIHM{KMJLL{LJLJHI{C{95055AEK{MLI32223@BCDACFHJINJF+****HHKJ{{KJMJJISPB@;810.--022299DIIEHJH{I{DCECGLNLLJ{LMHGFC9999<MNOKHJJMMLIDECFHFHFIGLOI{GECEFEEEDDIKJFIGIIHIGGIJKFHKHGFHVGIHJHMJIOJHFHHIB@AABHNJJGJ{KJ{KHLIHGHEECCEKIMRKMFJPQRGHHHHJHRIMFIXHLGJK{LKIJP\KFHNFGKF{GKMIJMI{JJJOK{NRIIHA3/---..0/.//12<>>?HIGIIOFI>555500001LSOIGQHJIHOIGJIO{KHN{D{NIIN{LNL{L{SOLGMIKOMOJGKFIGGFIKLMK{MQH{ONMIKYJJHMNOIL{LMG{AACA-----?;9449888BIDFLGL{LMNI{LO{{{I{\PJHRLLM{FL{JX{LINMJJ{KHJHHJKHJ{IHGJL{KHL{J{H{N{LSJMGJ{GLK{XMKLIIJ{GLNMFIKJ{SJMIJHHHEFEHEF{HTMFC@@AH{LKMYKIIJUKIJI{{HTJFGIHM{KF{IGHG{IIHKIITH{RFIIK{NcJLRIT{JNKMLUJ{NNML{JLGJQLOJM{NIK{L{LHKRHB{J{I{{LYAB.*)'&&&'5?@{LHOJLIKLNIJKKJBJLNHK{JKGHLJHK_KHQKIG\LLJKPK{S{HWIHN{JMP{LML{HP{HI{J{L{HJN{]QH{KKLHHJOK{M{JJINMQIN{K{KLJHIL{O{KJ{JPNIMSKM{{{J{IINJKKI{LK{T{L{M{I{KKGFSFFHILJ{KHMJ{K{{KIOJLJKN{ILKGXGIM{{NIMGJSIIG{TJIFJKO{{H{IHIGGKBC@=:::EIQ{IG{{{IGP{{LNMOKG{GJ{{KOMK{MHWEA<<1:AGAHGFFIJGHLKGK{OKKNK{K{MMHIK{IT{RK{KFJ{DNMLSEDBBELIIIHIEHJ{{{J{QH{{KWKHL4444HMI{JJP{MOJRFGGLGHLLLMNIKIIDHHMMMIFHILK{JNJRIHJGIIIJLLQJGTULKHJLLH{GTKK{{HHJF{{{O{MCDCDECKJNPKOKHLIJDCBDCCJNMMRRJNKOKLVNM{LIKHHIH@C{I{MQL{QIJHDFB5554((((+6777BEBAA@EFI{J{KH{EGIOGJLL{K{L{IY]{NIKMIMQFKJRO{{IJ{HKFIIHFHIIJQ{L{JLJEEJINHLKJLHGGD73?I{GFB<<ABBCCCCIF@>:51+*++---+(('%$$$#$%%&**49:EBCC32233>>><<?>653223))('').1..//-.*)()('%$$$$$$$%%%%&&'))))((),./6/-,,((**//35;98,+((((''&%%%'&&(&&(00/.,-.//7BBCDD@?><;;;>@@:2,+)('(&&'((+-./11>???<;1))+&'(((.*(('((*012/01110-,*)''&&&)-/201287:<@ACCC88888>2,*'+****943/**(&&'***))))),7322*''''*,-)'&%&&&*))'%$$$%,-/564231...,,++231..+****+..*+55<=<>CGHCEFEA@A68EJGJCF@@D;-,,,,+*&$$###$%%(1122.,))))***+,,+((''*,.//39??@@@B=44411)((()44:?ACCBC==7543121235===;+++++++)&&&&&(&%%%&'4:>ACC====8<<99,,,+++*)&&%%%%%%$#$$$%&)+((.1**+,+)''''())-./324(((((666?@@>?<<==>CCCB9::8..-,,2<=EMKFGFEEEEGGJ{KGHIQA98-++))**+04000010//0/*+*('),+*''&'&&(02<>;87778>@?@?91/)&$#$$$$'*22210013.-----,14658887/..-10-)((*((),,**,-<><=:88-.-,/;<<;=<;799325440+++,,+('&'&%'%%%%&%$%'/.('&&&+,--,--''((4655,,*)))%%%(*+114458;:1-,*)*,-/1222//...,+)&&&&'*,/.,****'''',033:;::641--..//111/--.22/,..'&''',-?8554.*'-588700,+++++5,++)('&&%%&+)(((&%%%%$%%%&&'(-,.<AAA@955*'''(31111464/-+)(''&())))))*234C???@@C888763--720/.-+-0002/+,+,5;2222...,)'&'&&&()&&&'')*--<<=;;:65327500+((((./0/0133447@87:::<DLJGFHFKFFEHFFI<;;9;DCBEDCEDIDGFB?>>>=DACCBCACBGGA?GH{{JK=>=BEDED=;;;:?F533343**))***3,,-::44444:?@>=<<989<CGFFFDCBCBD@BF><99888@@BA?<:)(&%%&%&())+'&&&'888989;<;;;?DFGFHBBA?BCDB@=<9988852334ABCECCCCB@ABABBBBC??@;;>1//009...;999:;CCCAABCBCCGFIEFFEECDDIGEH@A>DED@??@@997568:**+*+?>>?DADEFCCEFGHKLEEED>>>:0000>=;=::9::;.-<9=CFG?>===?BECAA???A?BBBEEGCBCAA>=>BCBCCGHFD@;,,***0130+('&&%$$#$"###269;ABDDDCD@=861,*'%$$$$&(235;@FH@GF;;;7565>ACEKJFIINHIF<5/--2677@@=;751,+*('%%%%()**01<??DC<732/.(((0(0000,,,,-;;;53432100//..())))*3320////30001=?B?>988:;;>>=====ECCBCDGFGH<<<<<CADFD>6/077:CB;62./122230-+('''(()(./011112;59>?A=>?@ADEA9//0CFDDHGBBC>@?CEEEEFFI{GGHIJGIGEA?@887778@DGMEHFFGFHHGKDCCECFEEBDG{{REFEEDDGFOINH{HE?8422IJEIHGFHD?<?DDGLF{FEKD{D>9777@C@AAABGAA??@>?BA?>///./:76;;<@877:8789987777:&&&&(*0)**)'&&&&))**('(()*+,,-9?87776<74++***))),/00011/05555=?>BD@BBCFPO@@ABAFEC10001BB?D@?:7225:DDEGGDA966A@<<=AHEGCA???@DBBFHGGDEDEDEIFJZIEMFIFEB87775663210../01<==;:9767610,,())'389<EHHF_M{EJDDCGEEC=<<>?<9.%%%%''&&&('&&'-1;;<=@@?>641,,,-79=;9-,)(&%%'%&'459=7222545338889?DMJCDCD??BDEFCDCCCDIB=<<==BB,++++)(&&&''=><55556B:99::;@AB@A@@ADIFHGGBC@ABBBB=<<<=?CEFD=<;:9955,,,,-0'%%%&&&''271778:<@@A333345596660////882,,,,,++,++-)*221*((((().0<==PEDKHIOIJEODABDBDEJJRMBA><:6777667ABCEEA>=;:;;;>BE@@@@@DAAAACBACBA9<;<?<<=?BAA6556579<<=@6542.(('()+/))((),679?@CBCCBA??=>88876767111100...*('%%')*(13445;<=:=@B?7>;5));<6:(><*+A@>?@ACEIBBA>=96542-('''++*,120125>@A0../8976677>BBCD@DCH?===4222=>@AAAABBCCCB<'''''(**...---&&')-9<?FIGIIDFBB5555599:9HJCCC?@?>?C??<<1/*('')(')))**++0025676:?AAA99:9:75566;@?>><+''''112@@@AAFBBA>>??GF>==7210233479544556@>??>777789889DGFDAC?=;+))()*+(''()*)+/.,*)''/3,,+(&&&(,,,,,,===<>=<<<==@@BADIDDF?;8.***+-,*)('((')..556<==@@>=>><:89;FFABA:@FEG?<64-+*'''')+,010($$$$$$%&&)**+/01435CNQGQED>=1+)%%$%%%%&+,.//++*++=;;;<<B?>>=<::::<GGEEJGIMRHGBBAAA{C=?EADFHGIGMH{DBA3DECDBAA@BA7<:9971--,*----/>?>?@=??;89867667(((()((((...,.13/+,-***++,,../001>===<<6/-**)(++--.(58:89410../9??==??<0+*)%%$$%%&*+)*/49:<<><<;88:::<:5)))/08430.)'%%&&'/01+)%%%,-249;?@AADA882,-+*)'(&&&&%%+,676443)))('&&'%&'')<@@BCEB@A;?>>>DBBFGHBCBBA@<9*)(''),,,,23669;C?CDB@ADHH<@=9/.-,'&$$$(+29??B@>;;2(%$$$$%&(-:;?@://,'''''++00?>;<?AIHIHECD<999:@==;99;?ABFFNKK@:20//1344455556>>8778*)'''(*+;=>=<<,,,,-8;=>9851+(&%%&&&'(443542*(())+845HFDD:9=<<=AAA@==>??@GFFCC4222>@9832/-,,,,'&%$##$%&*+++,***06897444)'''((/7887764,)'''%%%'((-094<<<CGBAAB<A0..-+/0BBEDED?:;92)((()*(('&&&&'&%****11=<>>>?B>=<<=A<=6321+&$$$$$%(/5<>FFFFJQKQFFJKGJLLIDEDDCD=88777:::9855434112(''''*--/2121,**+-:678>@DENIOFDFEFFFGFFIHF;9775.))('%%%%'((&%%%%'''/3<@A;9-,+,,....-+*((**.0/*)))))**)-,&$$$)-.5597779AD>=;('*'&''(22-''%%')&%$$$$%*+++,,.<@BBCBDCBCDCDGEL<F?>?222221+**+;<@DHH>;50+)'&&&%%%&+**(()).()*')5?CEFGGGLI@79&%%%&').3677;;=BA@=55545:987532,****,'''((246)))))7?BA-----578;EJHEDG{AE{FDEELL{::::8**)*++...?AACDNGH{FDBAB:99.,-1B2<<;<<@DF@@AB?@?>><97++++&&&&'/))&&&%&(&&&&%(%%&)''''&%$$%--3557<<;;888833322374334411/0097779:FFFDGFCEBFMECDEEEHGEGIGIG?>;667,899:;<>AG<?A<<@@B@>=>>?>400/**))&&%&&'(,.1057@GHKDGIGGEDD,)*<33989::,&&&&()100001/0)('''(/07>>BB?;?ABA<?==>==?>?BJB;;9:;9:<<?A?40/-..--/..//0533322933))('&%%%&&)()-)1-))*))****(('('')@AEB@?@;::::;;86998?CADEFC544458-)(,,.++**'56)100.1954445=>BEFKH;:936?=8:;9877873-()/1/-*(&&&&*,(((('')*<>B65552447(((('..2//07778<=;DBAA;:1113;@?54445A@@ABAB=01/.+)**+*/.-/*)'&&%%&&''%&'()**12111101/0184,1/+'%$$$$%&&&&'(56<><>?ACFHK?@?9:A@C{HOEDC?C@=;70//***++@ABDDDEEGJHGC{FF{HEAAA?@EEB;8887/+++,.,,,,68>BFDFFCCBABCA>??>>;;;721322+*'''&$$$&'/18;>GPHCDFDA=<))))*7@?ADHFACMD>>>>@FEDCHHFEEDCBCD>DF>>A;:855.,-79:90-,**+;>>A==>>=>@?A@DBH?=1(&&&&&()'(((((()'(/0444CCCEMIHKEEDCA?>>>BEDFCCDB@HDDAB754.*''''')*),+**,../22?AABEEBBDDDFKEEHJHJJB<=54223///347=84222278<BIFIAA?@DFF?<720/0025532,++)&())((((0>>EB@@>????BBA@>=;;;ACBBBBBBB@=;;:+***))-.''((%%%%&&&&'(*+.1))))*910/.+.77:;;A@FEJFEDCBEDFMGKILGCBA=C/....2+*(&%%&()**68B@11+****+*)+)*--+-/..10//*)(('$%&))+,46<BCDIGBB<:9+++++=AA666657/,+('&%%$%%%))'()&&%&)*..0=?DJDEBDCDEFHF43323355332222766556>?>>AAAEEFGFFGUI{C-----8-,,,/,,,,----.1/.,'%%%%''./0776668;<<=?BA76667@?BD{FCBCA<3331300//+))((..-++,)'((&())*CF>ADDCCE;33,**..//00.**)*''(''%%%%&$$$$#$$()-248:77755.--,+((&%&,,-?<98789<=<;9((42100/,,-/.((.)))*//,+&&'7:851****;<9870-,,++(()%('()+,234;:89.,,+,+)*--/...89:<====A@>>>@AEFEGDFHF?==632-,('((*+*'((,13333455BA><=>>@@?<===>EHFIEDCAA@@A?CBBAABBCEGHCDDFFMGHIIGNNJKKHHEEA@ACCBAG:9988:8+*(((73.=51001B10002,)(**+/:@GDFGNHHFHGHMHM{PJIIGJHJWLMFDEDEEIDCBD64440,,,3FFCBCDDFA@?@@B)))))=?>CIE@??@B>@?90..(()3566;<=>BCB;;:88::<D?===?AEFFCF@@A?3...2778622EJFKGFHHCA@@BCFG{KL\ABB@C>:54,)'&''''''())'.+*,-.BC>?>?E??>CBCCFHFGHHIGCEFDJMJHJFIFIMJJ{LGGFA6334EGFEFEI{JKIHGFEHE>@@ACHEHG{H<8876'(223))))&&&&&&%%''-*))***,+**+,-3431*)(%%%)'())))4((((*332211---)%%%%&&&%&$%*1345579<>>>{G>==<=11.--/<<764473.,*))*7)*45<A>==@55555<:888<888685.+)&%%%%%;=A5555420(''')&''(,''''''57?>??E:75./3***/+)'&%%%%)%%&19981+98/0::>=<=;;<<=BA?777678;::977553/('&&'(,-679?>?BCAGIIFJGACBBC5-,,+-.1100((((7;<;;;;CFHDDEDEA---))'&&%%%$'&%%'*.//77@CD>::9<;;;===>5.--/..667>>99965520)'''&&''&)()%%$$$%&((((&&&&(&&&&,038DGE>>=<<=81''&''%$$$$07349778??@IA@@877559996;CECFIIEFGFDHNHA@???@HI?>>>?>@==CDFI??86*((''(((3(((.-,++())()('&%&()()8765,+***''-13100//22289>><;9/08811;;;>@DFDCBA956566?B=:8787*****;;89:99=??@?@BA<980...,+&%%'&&%$###%&+..+**+**(*++**..3443351)3CABDEGCBLEIEC=>=<@?>===;==>>@B<<<<<B@ACDDB@D@=@B=<;83*))).*.++&$$##$$&'''(*+.123567?<<<<83643213.000.*,554307753,,'%$%'(*+,-74...((((**)'&&&&012>?C?DEEEFOKMJCIEC??8.-+*((('''''....==<=>878554-)&%%&'%%**+1254ADBB=))(((++*'')**+)++,111156;<BA>>@@DFGHGG???><>===75-,,''''(&&%%%%$%012999:;BBJD88843.-.,*(''''&&&'(-,-.,&$###%%%%''''/,'''',-:8BECFDDGPKIGFGC.-+))),--/00200)((*+,---6:;@@DDC@?@A<'&%&'150+)))0///994444432226:::;;9;9=DCD=<<<<A@AA@32224554443200)'''''&&&'))))+4:DKIGPIFLHFcGEJ{VEG<;;6(((()AADGGG{PLHXLLTNIJMHMIJ@BBA@CCB@A??@ABEEHDAABAEB=<99:>@<:,8>:(''''((,022-*)(),''''((()))*1.3789AA?@(((():9:>?BABFDEEDEDCBABAAA?>=9:;ABBFEIEE{H{JFGFHIEE{GGM{FIGGFFBAABBJEBBCEBCDBBB:743-)&&&&%%%%&''-01??@@:9856<;;:;98>?@A@A><6556??ADBBBB>AAB@>>@DEEHLDC445EDFMGH{I,+++GGHHIJFSCBACBKGGEBD;110111)((((((('&%&&'*)%$$&+,,,,,,-315;>A>9899;=<:;<>BEE@:@>988><<<90+*-8;9<B??<<)))),(%%&$&+,-101/+*'&&%%&'19@BHFGFH>9852122433(((''.+*,%%'%%&(),++&&%%''$%%$$%&&&%&%%%('''&%%%%%&$%%%%%%%%)/9::;<<<G@@@==?<<=<=???=<<=@?@@A<<<<;<;;;778866730010*&&&(+*&&&'***7221.-/.,,---/&$$%%&'*+*())))))))))=>>FFAAED@@@@@EEH98***8683222288222225443.-.-+++('''%'**-),*'&&&&&%%&'(79;>@C;8888;=A=::92...,******-22&&&&')).2./77:;81-,+*(&&$$%%''=A@B@?;:98.))())'%%&'+-0;<===+++**(&$%(+//13334EHKGOLGIJMJIJMOFEFDCBBB99989=82.)&%%&&&(.01234445DGEEAAACELGE====CE@>::98310---.,.1---(((((-.-*)(&$$$$$))((***,++****''''&*.../----*+++,4:<>DGG>:9::>=?@DBBA===<<?>?CDDGFFGEIFCDCDMCC=;;<CDGJHHKC@A@?BDCEA>;4/,(((),,06553100011569=AFDEEEIEFEDEJ@@?===>==........833/-,('&%$$$%%%&&%$$$$%''''&','&&'')+'''')+2270-,'&&('((%$$$%',57:>>ECCBA?@BDDHHLKFIHCRGFHGFLIFA@??==;.....>AAA=64-,++*++,))))))*/36ECCDCEBBAACCBE>=>>=>@JEDGKDF2A@?ADFC@A@?CA8888==>???=<?:744434644/,(''**+13:=AB?BCD?===?BBCIBAFBEFKFHGGIFK{D?<::-*))-((&''%$$&&))..<>DEK=<<;;=://++(&''(989::90)&&()++,279@AA==<;<2////=FEJIK:++'('&$$$#$%)**)6@@FDCB@@BDCC@6500/+((&$$####$%&&'+,&&$$$&)%%%%(+**)(('&&&&&%&&'%$$%&&**((('(***+***+-.14688A=<<==AB?>???DAAA@AA??@DDHBC@B@AB@@A@AEFEGHIDEDFDDDEDDCBBBA9911.+,,++++)+,---4334:;BBACCAA55555<:755)((''+)*+%%'&&'(+9:747777CC>622)&&&&'-(,****+,:?II877769821445?@A=<998:=?@JIG@L@?>765211***()))*---.'''',&)))*(((()5569:A@?=<2100.*+830/;:998891.----::;DCC<;875)('&'''((&&&*,4433479AACEEF<<;;9:1(''&%%%&,./15-,,,,*8445>ACGFFEEEDGEBHEFEHEEC<;876433200(((((())());@?;;<;<>98542/-******)+,99;9<22112.,&%%&(+***-/0,+)%$$$$%&''(((()9@C@?851221243/..*++,>AGH@@@A?AB?('''(7713422+/++,,,7CAAAGEJGC:;;54<???B74:;2000)''',/1788<:53322334556CCAABCBA>>>??DDA;5559GKFEBAAACIEE8652-,,((((()34:<?==?;==9*,+(%%$$$%-,,++,<>?@??=776751000/--+('''')++++,355:<<>=?=<8+*'&&(((.034766:9;>811113-++*)-)),*054589ABCCD@ABEEDII666448799878:777:>=<<,&&%%%%%%%)/0100127:BB323100'''''2201225>?B</....034564&&&&&&&&''&''()++****+++88<:8@DABCDC>>420//..-(&%$$%%((()+((),+/023=<>??@EFEAABC==::::;;<==>EFIMFE===<=@AACAAABDCBAAABDDBEEEEIDHCCB=))))*DCBB?98999=<==333,+*())).778=<:;BBG@HADBB?@AA@A?>>>//.+)&%%$$%&'-./0:89<@=<7/*'(')()()((()35<<:=;88866510+*&&''(..-//-'+&(5//*+*))('()*./*)***+++((,+,'%&%%%%&&'////1::33333BGDD{HE=<;4./19@???@@@@@BB{KHCDBD21111CDDEFGJEIGDFECDD44??BLGFBAA@97772//&&%%&'(()*()))---)())*.'&&&+.27;BBCGA?@?CK{EGCEEECGGHFFBCEGGDB@@AA?<633)'('()'&'(,8===?><<=>@DG?@=?AK@DDGCIC>=<<>=94.-,,,,0101.....43*,5411400223334:<?>??;<<<;52222:2/,***)()),-.-&&&&'(((++0365+''&')///014456;=-''(78A?;43211*)**-;;<<@ECBBB=DED@FD8888Z{OJJ=<<21111@DEFIGJJISFIJ:98&&&&)*/222356432202)))*+655356/.,,,.)&&+0001:<>?ACEEEFHF?/,*****,)'09CEEBABBCHJ{{{KFEEDEDDKDEJA?;92/.++++,2::,+++,=>=A@;+*&$$%%'(()+*)()+899:::E@A<<9;UEIGFGGHDDDEDECCBICB@@CCDDCECCCDBDDBCBA@?@@?ADACACCBDD;;::::=@LIJGLGHFLCA;;7.))*,-//027HHG{{OKFE>B:;;;ENJPGKRIJKHJ{UGFKGCC=<88/)'&)'###$%38<<<<ADBBFGGFJGJA>B?HEGGJMF=;EG??>?@EHED@<86511111/0-'(((*++12153,)'&&(**-(''''('''('*'()..//(((((((**34+*+*+--.-**))*'''&&&&&&+1488==<>@??A=634311003@@BGACBDCEFFHEFO{J<;;::::555+***.,,)'

Ironically enough, when I compress a sequencing round with short reads (like the one from my original file) with my compressDna + zstandard, the compression is ~35% better than fastq.gz, but when I try to compress long reads, I get worse compression than fastq.gz. I do not know why, but it is why I stopped working on this this weekend. Would love to know if you have any theories.

My working theory is that the sequence is sufficiently long enough + quality is sufficiently long enough to get within block size of gz. But I am not sure. I can't really imagine how it's getting better compression than 2 bit encoding + 94 character encoding.

CamelCaseCam commented 8 months ago

Ah I suspect it's because you don't have anything like run-length encoding while gz does. So if you have repeated bases or repeated quality scores, it can compress it. I suspect the reason it isn't better for short reads is that there's extra metadata it has to encode

Koeng101 commented 8 months ago

Ah I suspect it's because you don't have anything like run-length encoding while gz does. So if you have repeated bases or repeated quality scores, it can compress it. I suspect the reason it isn't better for short reads is that there's extra metadata it has to encode

Yep, exactly.

Koeng101 commented 6 months ago

So blowq isn't actually giving the compression I'd want (above gz), so I'm closing this pr.