gamcil / clinker

Gene cluster comparison figure generator
MIT License
518 stars 69 forks source link

How can I get GTF files into this type of software #27

Closed bioinfowheat closed 3 years ago

bioinfowheat commented 3 years ago

Hello,

Clinker looks awesome. But, all of my gene location inforamation is in GFF/GTF format, which is rather common for genomic datasets.

Can Clinker take GFF/GTF format, or do you have a good method for converting GFF/GTF files to genebank format?

thanks, C

gamcil commented 3 years ago

I actually just added support for GFFs in the previous release, but I haven't tested GTFs. You just need to have a corresponding FASTA file with the same name as the GFF, for example:

Cluster_1.gff
Cluster_1.fasta
Cluster_2.gff
Cluster_2.fasta

Then just give the GFF to clinker (it will automatically look for the FASTA):

clinker Cluster_1.gff Cluster_2.gff -p

Stef-cr commented 3 years ago

Hello!! Thank you for the tool, this is exactly what I need. I've been playing around with other similar tools but this one seems neat. However, I can not make clinker to work using gbk files created using Artemis. My objective is to compare 8 fungal genomes at a specific region (a gene cluster). My gbk files looks like this:

FEATURES Location/Qualifiers source <1..>38548 /organism="Ustilago maydis 521" /mol_type="genomic DNA" /strain="521" /db_xref="taxon:237631" UMAG_03114 10001..11605 /Dbxref="GeneID:23563675" /Name="UMAG_03114" /ID="gene-UMAG_03114" CDS 10001..11605 /Parent="rna-XM_011391163.1" /Dbxref="GeneID:23563675" /Dbxref="Genbank:XP_011389465.1" /orig_transcript_id="gnl|WGS:AACP|mrna.UMAG_03114" /protein_id="XP_011389465.1" /gbkey="CDS" /Name="XP_011389465.1" /locus_tag="UMAG_03114" /product="Acetyltransferase involved in MEL production" /codon_start=1 /ID="cds-XP_011389465.1" /translation="MKSNVDTVLDGYTSVPVGVLDSTLANTDILTRITLVFPSSLSLSALQESWYALVRSWPIL AARVRATPSTPSGLSYLIPTPATLESLETRSRNSASKPLEKHIVLLDQSSRSFSDYHPIV AKAVHSNLDRNNISIGGAPLVEHEKATICSNACTSWKQLIKQDQAFVTAQATKFADATTV TISFSHILGDAFTIKHIFQGWQTALNGQAVQELQDVGKDPFIKYLPKDTNDKKHKKNKKS EPAPDLPLQWFRYGLARKIKLISLLLWEVKVKKPEKTLGQYYIYLPQAKVDELMAQARSD LEQLRSSSATSATERDLNVSTFNVLFAWLLQNIHASTAIKPSKTSSVICIINAKTRPPAG HVPADYPRHQLWGGALGAPLRPLSAAEYVTLPLGQLALHIRESITEQVDPENIRKSVVMA LKHSMWKKPSGELLFFSQNPNTYWCGCTEWRSAKFHTIDFSAAATPHHDAIQPTAAPAAS VNPVAITTNMETPMTKRNRWALLGEANNGIWFTGGLTANEASNKNGFGRYIFVE" UMAG_03115 complement(11867..13726) /Dbxref="GeneID:23563676" /Name="UMAG_03115" /ID="gene-UMAG_03115" CDS complement(11867..13726) /Parent="rna-XM_011391164.1" /Dbxref="GeneID:23563676" /Dbxref="Genbank:XP_011389466.1" /orig_transcript_id="gnl|WGS:AACP|mrna.UMAG_03115" /protein_id="XP_011389466.1" /gbkey="CDS" /Name="XP_011389466.1" /locus_tag="UMAG_03115" /product="Major Facilitator invovled in MEL transport" /codon_start=1 /ID="cds-XP_011389466.1" /translation="MADEKRTSIEEPGTPMSYSTAASPELLSSSNNASALPAYPSSQTKQDKESLSHDQAVRVE PESSTPLTDSVEDNESGAKVKKDLHFWIIFSALMLIAFVAALDMTMISTALPAITANLPP STIAANWITSAFLLPMVASQPIFGGLSCSIGRKWSINSALVIFLVGSVVCATAKTFLVLV IGRGIQGLGGGGIHSMCEIIMSDLTTLRERGLFFGVIALVFAVAGFAAPVLGGVFSEHSW PWIFWINLPIGAISLVLLIIFLNIRVPLLTGKEKWQKLDLVGNAVLFGSVTAILIAVTEG GIKYRWSAWQIWVPLVVGLLGIMLFLVIEWVPNRIAPKPVFPLDLFRNRTASVAYVQTFV HGVIFYGVIYMVPIYFQAIKDRTPLQSAIWSFPLSAPSFPFAMGAGVLISITGKYKLLIF CGWMLMAAGIGWMTHWHVGTSKFEWAFSQVILGAGLGIMFPITLPPIQAALPASRLESAT AAYAFTRTFGAVWGITAATTIFSTQAAKNLRPYYDQLNPLGLSDFTVVAFSEQLRNLPQP IQGVVKGVYADAISDSYWLFVPLAIIGFFTTFGMKELPLPDFIKSEAKLEQKQDVTPALK SSAAHAVVNVKTEVPSTLP" UMAG_03116 complement(16497..18272) /Dbxref="GeneID:23563677" /Name="UMAG_03116" /ID="gene-UMAG_03116" CDS complement(16497..18272) /Parent="rna-XM_011391165.1" /Dbxref="GeneID:23563677" /Dbxref="Genbank:XP_011389467.1" /orig_transcript_id="gnl|WGS:AACP|mrna.UMAG_03116" /protein_id="XP_011389467.1" /gbkey="CDS" /Name="XP_011389467.1" /locus_tag="UMAG_03116" /product="Acyltransferase invovled in MEL production" /codon_start=1 /ID="cds-XP_011389467.1" /translation="MINNALRTLISSATEQSLDDVLEHEFRTAVQLQQMDQFEWHQVEADLWKRTCLGHEASAS FNQNIAHGHTELSLMTSWRVHQPSSSRITGSELELDQLVARVRQAWIQARYLRPEVGVEL DTHTDPTVAQTMCYRLLRDEESIQEWLDETFVVKRLGDPGVATPAELCAYTYNRPLATKG KKSMLYLILPRLDDEQRTAYTIWNVSHAVTDGGSLAEVFNTLFQCVIDATPSEPYDSIYT PSAFELNVLPRMPRSVVMAYRQQYQPKPEEIAKAHKVAEVNMRMITEKMGESLALMPSTS WPERKHETVCLCRELEANEVRELLKFAKQVHSGITYLASAATILSAAETFPERKASSKGA LVGMVRNARRWISATPLDASLGASTPLGSDAVFLWVPIDTHKTLEPSFSRMQELVTTARH IRHELDKHLTTPHCISSYPYVAESSIQGLNQQWSQIKAVQSPSSSSSQKEIAGIIGAQAP GFSSVGMMRIRPRFEPVSANARASGLWLERTDFTHTGRQINASPWISMFNVDGRIKLQLG FDTKFHEVEKMNQWLDRTVVWMRICAAAAATTSTSVSSTSVDATAPVFARL" UMAG_03117 complement(19975..21911) /Dbxref="GeneID:23563678" /Name="UMAG_03117" /ID="gene-UMAG_03117" CDS complement(join(19975..21443,21533..21911)) /Parent="rna-XM_011391166.1" /Dbxref="GeneID:23563678" /Dbxref="Genbank:XP_011389468.1" /orig_transcript_id="gnl|WGS:AACP|mrna.UMAG_03117" /protein_id="XP_011389468.1" /gbkey="CDS" /Name="XP_011389468.1" /locus_tag="UMAG_03117" /product="Erythritol-mannosyl-transferase invovled in MEL production" /codon_start=1 /ID="rna-XM_011391166.1:CDS" /translation="MKVALLANPARGEINVLLATAYELIRLGHDVTFLTGSSFANAIAEFRSEQNDPILAARIH FSDLGNARAVEDFTRGMQSHLKGLRKPPGDYSSMEICQIVALVTEQEFRDAATMVRDRLL EIEPDMIAVDALSPNLVTGCRMTGLPWMFTVPCSPSLTATRKSLFDPHPMGRRRQRTLMS ALENLKLTFIETYRNATKKDLYARRALLKNEFGLNSMGFNGDSAIVPPLWKDRNCVGGIH FNTPGLTDSIRQPHQIHFVGAGVTSDPENHDTPVFEAASFLKQLSPSSSTTFKPLFPPLS KTGRDLDVEWMDEAYAAGQLVVYLNMGSMFLWTDAEVRSCLRAFERLYEQSGGKIKLLFK LNKPKRTPNNTGASTPTAPISSPDFEEKQRTVMGSARPVGASNLVSEKLADAIKNVTPRR KTDADKGYSQFTLSEFEGLEYVRFTRWVHDQRSIYKHPALRVVIHHGGGNSFNEAVHYAL PQMILSQWFDTHEYAILAERFGIGLRSKHAPKIDENDLVNTMTRLLQGPEAEKIRRNAKV WSIRSRNAGGAPAAARLIEAQAMLFNQQKQLELAASEVRAAVDTLSGKSSVADLESETAF TPNMLSGAASTVGSD" UMAG_10635 23655..24140 /Dbxref="GeneID:23566636" /Name="UMAG_10635" /ID="gene-UMAG_10635" CDS 23655..24140 /Parent="rna-XM_011391227.1" /Dbxref="GeneID:23566636" /Dbxref="Genbank:XP_011389529.1" /orig_transcript_id="gnl|WGS:AACP|mrna.UMAG_10635" /protein_id="XP_011389529.1" /gbkey="CDS" /Name="XP_011389529.1" /locus_tag="UMAG_10635" /product="hypothetical protein" /codon_start=1 /ID="cds-XP_011389529.1" /translation="MQADKHAWKEVSPNLAQRPLVGVEKLLNYAEYYQNGNFQLSAAIHLETNLTTEKLQQRFG LAVWNVRCLLPEIGTWTVGTTGDAAVDLDNATFTAIQTVEEAQKWIDETAVIVQDGTTAE ELTILTTNHTIEPAGKQFRVYLVTNARQGSPAIIVNASHVLSGHRIAAQLCTIVQALVDA RLVSLLQAEPDPRAALRSIFVPENLARLIGKLPISLNTAYHKRFNPTEGDLENGFEKLSE RLANSALPSIGIPRLSSPATNPEYSLGTVNGEAMTIMNLRRTIGSNEYRLLIDAHKKLGI TVPSVIYACIVNSIDRRCKSNTAQDAETPGANLAYSAHAKRWLPDETFMTRSPVNMAIVL GSAYVSPDELRSKQHGCDLSIDELIELAKTIRAKQDAYLDTPHIISAMEQVGDEVSAMIA DTAIKQRQAGTDPHVALFENSPAICPPTLTSQGDIAFKRLYTAQGGSWDPEPAVAAGEYV YIGHSWNSGRTTDASVCFALVGFSGELRLTSYFDSRFFDAKLIESILDDVLSNLRMIATT VPVDAPEAKL UMAG_10636 26801..28548 /Dbxref="GeneID:23566637" /Name="UMAG_10636" /ID="gene-UMAG_10636" CDS join(26801..26911,27007..28548) /Parent="rna-XM_011391228.1" /Dbxref="GeneID:23566637" /Dbxref="Genbank:XP_011389530.1" /orig_transcript_id="gnl|WGS:AACP|mrna.UMAG_10636" /protein_id="XP_011389530.1" /gbkey="CDS" /Name="XP_011389530.1" /locus_tag="UMAG_10636" /product="Acyltransferase invovled in MEL production" /codon_start=1 /ID="rna-XM_011391228.1:CDS" SQ Sequence 38548 BP; 9387 A; 10135 C; 9985 G; 9041 T; 0 other; atcgcttgtc ctaagcgcta ccacacgaaa gaaatctcca aacctggcta aacgaagcga 60 cgcaggtgct tcttcattcg agtgatatcg aggtccgtac atcgctttca acgcatgcca 120 gcatcgacga tgtgaccagg tgtccgttag tgtcttttga tggtgtacct cgctgcatgg 180 aatcaagaca agtcccaaac ccttccgtgc aacaagcagc agtcatcgcc tgcatctgtt 240 tgcggtccgt tccacgtcca atcaggattc atgtatgaca aggacgctca ccttgattgc 300 gaatctgagg ccactttcgc ttgcacactt ttcagcgatc cccaaaagca taaaaagggg 360 gtgtcctctg actaggtctg ctcagctctc atcaacccac ccaaccgatc tcccttcctc 420 tctctacttt gtagcacctc aaagcctaca ttctctacca cttttgaacc cgttccctgg 480 acaagatgaa gttgtccttc aacctgccgc tgctgctctc ggctgccttt atggcagcgc 540 ctgcgctggc tctgccaaac ctccacgcca tgatggctcc tggacccaac ggcgagcctt 600 cgcaccttga aaagcgtctc ctctcggcta acccgctgca cctgaacaac ctgttcaacg 660 acatcaagaa caaggttgcc gagctgaaca gcaacgtcac cgttcctgac gtgggcagcc 720 tgctcaacac caagattact gatatcacgc ccgaaagcat ctacaacacc tttggcattc 780 gacgagacct tggactcttg ccgcccgccg aagacaagta ccacccgtgg cagcctcctc 840 ccaagggtgc caagcgtgga ccttgccctg gtctcaacac aatcgccaac cacggctacc 900 ttcctcgcag cggtgtgatt aacccgatcg atctgatcgt cggtacgttc ctcggcctca 960 acctcagccc agaccttgct ggtatcctgg ctgctatctc gtttgtcggc atgggtgacc 1020 tgctgcagat gaaattgtcc atcggcggcc gatacggact tggcggcggc ttgagccacc 1080 acggtatcct cgaaggcgat gcttcagtga cgcgtaaaga taactacttt ggcaacagct 1140 gggacgctga cccgaagcta gtcaagcagt tcatccagga gaccaacact tacggcaaag 1200 gaaacgtcaa catttggtcg ctcgccaact cgcgctaccg tgcttgggat tatggccgca 1260 agaacaaccc ggttttcgac ttcaacccct ggcgaatgct cgtcgcatac ggcgagagtg 1320 gtttcgtgca cgaggttctg cgtggctctt ttgtcaagtt cgacgagacc atgatcaaga 1380 actggttcat tgatgagagg ttccccaagg gctggtccaa gaggattgtg cccatgacga 1440 ctcccgagat tcttgcttgg gccggtattg tgtttgtggc caagccgacg attcctggtt 1500 ggagcatcgg caagggcgct ttcatccccc tgcccaccac tgatggcgct taccaggagc 1560 tcaagtcgct cctcgacccc aaaaccaccg gtgccacgct cgagtcgctg ctctgcgatg 1620 ctagcaatgc tgtccttggc tttttcccct cgcagatcac caacctgctt ggcgttattg 1680 gcatcaaggg tgttggcgcc caattcaagt gcaaataagc gcctccaatt acggacggtg 1740 cgcaagctcc tcgtagatgg gttcccgtct ggatcccaac ttgactcacc actgttcccc 1800 tatctcacgc catggctccg gacgagcttg ctttgaccct tctcgggctc tctccggacg 1860 actacggcgc ctaacctttt ccgactcttc ctcagacggc tctgagagtt tgtcccctgc 1920 tccatattca ctattcacat ttccgcactc acgactttga ccctctttaa ttcgtttcga 1980 cttcgcgctg gaatacgcag accacataca tacatttgac cgtttaatat tttttggtac 2040 gagtcgggag tatcagttgc gctagcacct ctggtacctc ggaccttgct tgcttagaaa 2100 tagattctta atttgcgttg gaccctcgtc gtttctgtgt cagcaatggt gaatggtgaa 2160 tgtgcaggtg tgtcgcgtac gcagtcgtga gacactgcgt cgcacgaggg atggtgttaa 2220 caactcgcga agagagtcgt gattgcggct agacctcggg ggcatgagag ccgtgcgtgg 2280 tgtttgattt cgcagccgtt ctcttggcct ctcgcgcgag atgcatcggc aatattctct 2340 taatcatcta gatgaagtcg gagtagcata ttcacgattg ggtaatagga gacgtaacgc 2400 gtcgtcgccg agacaaaagc tgtaggctga gttttgcgca gttagagtgg tcaccactcg 2460 tcactcacga cttggcggct gtcagcaaac caaagtgctg agcagagaat tccggattcg 2520 tgattgttag ttgtttatta tataaatata cattatctct tctcctccat tatcttggtc 2580 atcgtcgcta gactagccat cgcctagcgc gcggattcgc acagaccttc gtgcttcgct 2640 gagctcgcgg accctgacgc gatcacgctt gtttgttctg gaatcggaat tccttgatta 2700 gcgaatcatg gtcaacccat aaccacgtct gttcctcttg taccatcact tgaccgtgca 2760 atcatcacga acctgactcg cttcgacttg gcctctgctc ggttagagtg acgaaacggg 2820 caaggcggct cgaacaagaa cagcttcgag tatttttgag caaagccggt gctgctcgct 2880 ttggcggcct acaccgctca tcacatatcc gtgattgttc ctcgtaagca ccatcctcat 2940 tggagctttg tcgagacagc acgctggtgt actttgctac tttagctgga aatactctta 3000 gaatctgtaa agcatggtac cactcgacat cgagctgctc agccgcgctg ctcctgtcaa 3060 tgtggccgag tacctacgag agcagattac gacgctttct gaggctccag tcgcgcaaag 3120 cttgttgcat ttggtcgctc ttgaagctgt gcctccagat gtcgtccgaa tctggctaag 3180 catcaccaaa gatgcgcact cgctatcggc tgcactccag cagaattcat caaatctcgc 3240 gcgaacctcg gccattcaac gcttcggagc atggtttcgt agcaaacact ggcgagcagc 3300 gtgggaggca ctgggaggcg cgccgggtat cgctcagctt ttcgcccaac tgtctgtctc 3360 gcaagttaac aagttcacgc gcatcctcgg tcgcagcgtg agaggaaaag caggtcaggc 3420 caagactgaa aagagagacg tagtcacgca actgctcttg cttctcgtgc cacaactcga 3480 gacaggaaac gatacaagtc ctacgcagca tctgaaagca gcagacgccg acaatcgtga 3540 gcttgtccat ctttactctg cacttctccc agcatgcaca tcgcagatgg tgcacaaggt 3600 actctttggt cctcagatgg tcatgacgct ctctggcgac caacgccact tgctcagaga 3660 acatcccaac gtctttgtgc aagacgcagc tccaacgctt ccgtccaagt caagatggcc 3720 cgatcaagaa aagcatctgc ctgccctaat cctcgctgtc atgcacaacg ctcaagtcga 3780 cgatcccgca aagctcgttt gctttctcac cgatgcgttc gagtcgatct tttccaaaga 3840 atctgcgcgc atcagctcac cgacgcctgt agcgtcgtta tgccagaaac tcgtccagcg 3900 tcgcggtctc gagccatcgc ttcgaatgca catgtttcaa gtcatcctgc gcatcctatc 3960 caagcatacg gagcgtgttg gtccattcgc agctcgtcta cacaaacagg cgcttcgttc 4020 ttggtctcga gcagctgaca tgttcgaaca catccttctg gagcttttcc acgtcagtcc 4080 ttcacatcaa cgacagcacc tcttctgtaa actgtacaaa cacgtgcctc cgaggcagag 4140 gtgggctttg ctcaagctca tggtttgtcc cgatcagcac gcctctacaa acgacctcga 4200 tgaagaacaa gttgtgcagc aactactatc ttgtcgtcat catctaataa gtcttcgcgt 4260 aggtgtcctc ttggccctgc ctacaccgca agcgttgcgc ttgatgggtc acttagcggc 4320 tcgacgtttg cgcgatcgtg tcattactgt tggtcgggct gactcgatct tgggtttcac 4380 cacagcgcac ggctatctga actttgatat ggtgcgcttc gagcttcagc gagaccttgt 4440 cacgcaagag cgcatatctg tcgccacagc agctgtgcaa gatgtgcgcg ccagggtgtc 4500 tcgcagctcg gatgcgggtg ttcgcgaaga tctggctcgc agtggcttgc actgggctat 4560 ctgcagccgc tccatcgagc tgtacacgag taccatcgta tggatgcgtc gttttgtgcg 4620 tgacgtcaca gttgcacgct cgctgttttc ggaaggcagt ctgcacaagc gcgaagcgat 4680 ccagcttctt tctggcattc ctgaggtgct ggatcaacac gtcacagctg tgcatatccg 4740 cgaccgcatt catgacgcaa acaatgccat ctgggaatgg ttcgagacgg tgcgcaaatc 4800 gttgctcgag cccagcttca gcgtccaaaa cactgcaggt gcgcgtctgc tagttcgacc 4860 agtcgtcgtc gagcgcatgc gccggatcga ccaattgcaa acacatctcg gtctagacac 4920 caatgccatc tcggaactcg tctgggacga gactgtcgat ctcctaatcc gaatcgaaaa 4980 ggcttgcatg gagcccgact atcgccgtct ggcaatgtac catatcgctg ggccgctagg 5040 cacagaactc gagccctcgt tctttgctac agatgcgctc agaattcgaa gcatggcggg 5100 cctctcgttt gtagaaaagc tctttgtcgc tcgcgatgcg ctctggaagc aacaccgaat 5160 cagcctcaac ccggcggtag ctcggcttcc tcgaggcatt cctcagggtc ttcccttgca 5220 tgttgcacca agtatctgga atctggacta catccttgct cgaacgccca actgcttctt 5280 gctcaataag gcgagagagg tggtgtttgc tgattcccac atcgctctcc agcctgtggc 5340 gcaagacgat gcgacaagag acgccgtcaa agattgccac caagattacc tgtttagcct 5400 gctcttgctg actaaagcgc accgaagcaa gatcggcaag cgccagacgc tagagcaagc 5460 cttcgtacat gctgttggag tgctttcgca caacaggatg acacgccaag aagcagaggc 5520 agagttaaga gatcacgcgt ttactgtttt ggtgcaatgc cacaaagtgc ctagattccc 5580 gagcttgtat gcgccatcca ggttcccgca gctgcccacc acccaagtcg acgacaggaa 5640 cagcagcatt cgaatcgaat gggatccaaa taagctcggc agccctccaa tgtttgcaga 5700 acgccagcta gagaccacag ttttcgacga gatgctcagc tgcagtgcgt ctttgaactt 5760 gcacgacgga actctgtgct actacacgtc ggaattcggt ggatacagct ttcgaacaac 5820 cccttacgtc gtgcgcgcgc gccatcaagt gagcgccgcc gacgtcgctc ttgccggttg 5880 cagaaatggc gaccacccga aagacaagat caggtgtatc cgcgaagcgc agatcgtgtg 5940 gcatctgatg ctgtttgacg agctttgtgt gaggccgaaa tccgatacta gtgcaatcac 6000 tctgccattc ccagataaag ctcatgttcg gtaccccaac atggtattga gtcgctcgtt 6060 tgtcaatgca atgcgcgcca agatggacga caacagcgcg atgcaccaca tgctcaagca 6120 tctcggcttc cagatcccag tatcgctatt acgagaattt acacaacgac tcacgactgc 6180 gcgtcgagag ggacggctgt cgaaaggcga tcagctggct actcagtttg tgcgctgttt 6240 gatggcgagc gatcgtccga gcgcagcgct tccgtttgtg ctacgagcca ttctggacaa 6300 ctcgcaagac agctcatggc atcgagtctt gttgtcgcaa agattcctcc gacaattgcc 6360 tgctcaagac acttgcaaag cagtcacgca tttggccgcc gagattctgc aacgcacagc 6420 cagcgacacc aaatccggca tcagcgtctc aaccgtcaag atgctcgcac aaatccttgg 6480 ccaagccgac gtgattcctg cttcggttgc catcgagaca ctggcgaaac ttctcgatca 6540 aaccgaccac gtcgatgttc ggtttgcagc gctctcgtcg atgctcgagc tgatttgttc 6600 caggccaagc acgcatgatg cggcagcaga gcgactcgtt ccggagcttg acagattcgt 6660 gcgcattctg gcagcaatcg acgagactcg gccaagcgag tccgaggagt catggaaaga 6720 cggagttctg ccgagcgtgt atgaaagctc gattatgccg agctcagatc tgggatcgat 6780 gaaaaaactg ccgaggactt tccagctagt ttccgagacg atgcaaaaaa gcgcattgtg 6840 ggcaccgatt ctcatgcaga gagtgatgct gccagctatg aagcgctcga tcgaggtaca 6900 tggacgatgg ctcgacacat tcctgcagcg ccactcgctg tctgtcgaga ctctggcgct 6960 tccggttccg ccaaccaagc tcgagatgct caacacgctc gtgacgcgcc acgctgcctt 7020 gacgccggat tggatcgtcg agctgtattg caagatgctg ctattcaacg tgcgacctgt 7080 agcgcaagtc aaacagctcc gagcccagct gaaacacacg gaccacaatg cggaaaagca 7140 ttggttggcg atcacggaaa agcctgaaac caaatcgatc ggctcgccgc cgttcttggc 7200 cgagtttgtc tacggagtag ccgagtcggc gcaggacaag aatgcgattc gtcaagtgtt 7260 gctggcacgc caatatgcta tctgcaaagc acaagaggtt ctggaagcga tgctggatga 7320 catgctcgat tcggcatgga tcgacgagca gagctttgac tgcatcatgg cacagctgcg 7380 gtacgagaac catggccaac gtagaacgcg ttgggcaagg gacgtcgcat gggagctctg 7440 tctgcgtcct atcctggaaa aagtaattgc caaggtggac agtcgacgga cgacggcttg 7500 gcagcgcgat ccagaacgaa agccgcgctt gctacccgac atgtttctgt atcgtttaga 7560 gctgctcccc tatccgcaag tcggagagtc gatcttgacg ccagacgaga agctgaacaa 7620 gtgccaaacg ttttccaggg cggttatcga acaggttaag catctagtcg gtcgagctac 7680 accttatcaa gacgagctgg aacggctgca aggtgctgtg actgactgcc ccgtccatct 7740 tcaggccgct gtagcacttg ctcttggcaa gctcgacacg ggcgatctca ccgaatgcct 7800 ctgtatcgcc atggcaatcg aattgttgca aacagccagc tgcaccgagt gcgagattgc 7860 agcgatccag tctatgctgg acgaatggag ccgcagcgaa gtcgagtgga tccgagtgca 7920 gctggacggt ctcgtgggtg cacgatcggg cagaacgatc aaggatgcca agctggagag 7980 ggcgctgcgg cagtcggaag tgggtgtcga ccaagccgac agcgacaggg atgatcgttc 8040 agacttggga gggagcctag acagttggcg accactcagc ggtgcaagca atgtcggtgt 8100 ccaatacgtg atgcaaaaca tgtaggtata gctccactgg cggcccacgt tggcagcatc 8160 gccgaccaga cccagattgc gttatgtatg gcacgttctc tgagcatcca ggaccgtata 8220 tgcgcgtcat tgtgctttca gtcaggacag atcggcgtaa ctgactgtct atctcgtgtc 8280 tagttgagtt gaaatgccag agaactcgcg agaaatgtca accatggtga tgtgatgagt 8340 acatgcgtac atatgcctcg ccgcgtcgat aagcgaccgc gacgtttggg cgaacgccta 8400 ggcgagatca tgccatccgg tggcaacgag ccattccgtg tgacgagcgt cgcacgaggt 8460 atttgagatg gagggagcaa ctgttggagg ggtcgattct cgatatccac cagcgcccac 8520 ctcgcgcgat ctgtcatcgg tcgtatcggc agccgatcgg caatcgggca cagagtttca 8580 ccctttttct ttttctgtgg atttttttgg gggtttgttt ggagcctttt tcttcgattc 8640 tgagggtcgt gggttctttc cggggttggc tagaaaccgc gtaagatgcg cctttgggcg 8700 gctggtgtcg ggcggaccac actcgcccac cactcaccac tttcaaaccg aatcaaaatg 8760 tccgagcaac ccgattcata ttaaatattt tgcgatcgag ccgaacaaat agatctaagt 8820 cataattaca tcgttggata ttaaatattt taaatagaag aggatcactc tgcccgaatc 8880 gtgaatatcg ctcgtttatg ccagccaacg tcgagttagt ttgattccag ggcgaactca 8940 cggaattcca ggttgcacat agcgcatttg ctcacatgta tcgcatcgga gtacggagct 9000 caaagcgaaa aagcaacgtt gcagttccaa gctaacgtca agcttttcca ctagcagcag 9060 cacaagacgg tgcatattcg atatggtgtg aggcatggtt gacggaggag ttgctctagc 9120 aagtctcatg tgataaagca taagcacgtt tgacaaactc atctttgtac agaagccttg 9180 cgacggtagc gaggacgaaa acagtatatc ccgatggagc taatcacgaa tatgaataag 9240 gttagtgtaa gtctgaatca acatctgata gctgggattt gcagtgagaa tcgtgaatag 9300 tacgcgtcgt cagtgtaaat gcgtgggcca gacggatcgt gttaagcgcc gtggttcgtg 9360 atgactggag ccaggatagg aggttggtaa aatggttggc tccatccaac tcacgtgcca 9420 ctcgtccaac caaaaaagac ctgtgttact cgtgacggtc gacgctaaga aaatgcgaat 9480 tggcgattcg taattcgcac tgaaacttac ttaccgtgta acccttgcca aacaaaaccg 9540 cccaaaatca cgaataatca cgaatctgga taacttatgg attttgggaa atcgtgaatc 9600 gcgaatgcag gtgtggtgaa acacgtcatt cgaagcacca caccagcttt ctgagcataa 9660 tctgaatatt ccatccgtcg gaaccatcgt gaatcacgaa ggatgcatga tgggggaagg 9720 acatgcggat ccaagaggct gcgtcgttcg aggtggatga aagaacgcgc gataggggtc 9780 tgaatagcgg cgcgcgtcca tggatccatg agaagctgag tgacgctcaa agctgcaaca 9840 ttcgtgattg agcggctggt ttgctgcatt cgtgaatgct ggcttgctgc tggttggtca 9900 tcctaaatca ccaccaccat cagcatcaac acgccgacgt cgctcggcca tagcttggcc 9960 tgtatcgcga gcaatactgc acatcgagct gtcgctgatc atgaagagca acgtggatac 10020 tgtgctggac gggtacacga gcgtccccgt cggcgtgttg gacagcacac tggccaacac 10080 ggatatcctg actcgcatca cgcttgtgtt cccatcatcg ctgtcgctct cagccctgca 10140 ggagtcgtgg tatgcgcttg ttcgaagctg gcccatcctc gcagctcgtg tgcgagcgac 10200 gccatccacg ccctccggcc tgtcgtactt gatccccacg cctgcaacgc tcgaatcact 10260 ggaaacccga tctcgaaact cagctagcaa gccgctggaa aaacacattg tcctgctcga 10320 ccaatcctca cgttccttct cagactacca tcccattgtc gccaaagctg tccactcgaa 10380 cctcgatcgc aacaacatca gcatcggagg agcgccgctc gtagagcacg aaaaagcgac 10440 gatttgcagc aatgcctgca ccagctggaa gcagttgatc aagcaggatc aggccttcgt 10500 cacggctcag gccaccaagt ttgccgacgc cacgacggtc accatcagct tcagccatat 10560 cctgggcgac gctttcacaa tcaaacacat cttccaaggc tggcagacgg cgttgaacgg 10620 tcaagcggtg caagagctcc aagatgtggg taaggatccg tttatcaaat atcttcccaa 10680 ggataccaac gacaagaagc acaagaaaaa caaaaagtcc gagccggctc cagatttgcc 10740 gttgcaatgg ttccgatacg ggctggcgcg caagatcaag ttgatcagcc tcctgttgtg 10800 ggaagtcaag gtaaagaaac ctgaaaagac gcttggccag tactacatct atctaccaca 10860 agctaaggtg gacgagttga tggcacaagc tcgctcggat ctcgaacagc ttcgttcctc 10920 gtctgccacg agcgcaactg agcgcgactt gaacgtcagc acgttcaacg tgctttttgc 10980 ctggttgttg cagaacatcc acgcgtccac tgcaatcaag ccaagcaaga cgagcagcgt 11040 gatttgcatt atcaatgcca agacacgccc tccagctggt catgtgccag ccgactaccc 11100 taggcatcag ttgtggggag gcgctttagg tgcgcctctg cgcccactct ctgcggcaga 11160 gtatgtgaca ctcccgctcg gtcagctagc gttgcacatc cgtgaatcga tcaccgagca 11220 agtggatccg gaaaacatac gcaagagcgt cgtcatggcg ctcaaacact ccatgtggaa 11280 gaagccgagc ggcgagttgc ttttcttctc acaaaacccc aacacttact ggtgcggatg 11340 tacagaatgg aggagcgcca agtttcacac aatcgacttt tccgctgcag ccacaccaca 11400 ccacgatgcc attcagccaa cagcagcgcc cgccgcaagc gtaaacccag ttgcaattac 11460 caccaacatg gagaccccaa tgaccaaacg caaccgatgg gcgctcctcg gcgaggcaaa 11520 taacggtata tggtttacag gtggcttgac ggcaaacgag gccagcaaca agaacggctt 11580 cggaaggtac atctttgtcg aatagcaacc taagcagtca cgtttccagc gagcctgcaa 11640 caaagatcac acagtaacac atgctgcttt ctagacaagc ccgagccgtt cgagccgttc 11700 aagcatattc tgatgaaaac aaccaaaaac cacgaatcat gcagagagga ccaagcacta 11760 atgagacatg tgtagctgag agacgaacag ataatcaaca tgagtaagga gacgaacaag 11820 taatcaacgt gaacaacaag atcgctgcgc acgcagtggg ctggattcaa ggcaaagtgg 11880 aaggcacttc tgtctttaca ttcacgacag cgtgtgctgc gctgctcttg agcgccggag 11940 tgacgtcttg cttctgctcg agcttggcct ctgacttgat aaagtcgggc aacgggagct 12000 ccttcatacc aaacgtggtg aagaagccga tgatggcgag cggaacaaac agccagtacg 12060 aatctgagat cgcgtcggca tatacacctt tgaccacacc ctggatgggc tgtggcagat 12120 tgcgaagttg ttccgagaac gccaccactg taaagtcgct aagacctagc gggttgagtt 12180 gatcgtagta gggtcgcaag ttcttggctg cctgcgtaga gaagatcgtg gtggccgccg 12240 tgatgcccca tacagcacca aacgtacgtg tgaacgcgta tgccgcagtg gcactttcca 12300 gtctcgacgc cggcaacgcc gcttgaatgg gtggcaacgt gatcggaaac atgatgccca 12360 atccggcccc caaaatcacc tgcgagaagg cccactcgaa tttggacgtg ccaacgtgcc 12420 agtgtgtcat ccatccaata ccggcagcca tgagcatcca gccacaaaag atgagcagct 12480 tgtatttgcc cgtgatcgag atcaagacac ctgcacccat tgcgaatggg aaagacggtg 12540 ccgacaaggg gaatgaccag atcgccgatt gcagcggcgt gcgatccttg atggcctgaa 12600 agtagatggg gaccatgtag ataacgccgt agaagataac gccgtgcaca aaggtttgca 12660 cgtaggcaac cgaggcggtg cgattacgga aaaggtccaa ggggaacaca ggcttgggcg 12720 cgatgcggtt cggaacccat tcaataacga ggaagagcat tatgcccagt agacccacga 12780 cgagcggcac ccaaatctgc caagccgacc agcgatactt gatgcctccc tcggtcacag 12840 caatcaggat cgccgtgacc gagccaaaca acacagcatt gcctaccaga tccaacttct 12900 gccatttttc cttgcctgtc agcagcggca ctcggatgtt gaggaagatg atcagcagca 12960 ccaacgagat cgcaccgatc ggcagattga tccagaagat ccacggccaa gagtgctcgc 13020 tgaaaacgcc accaagcacg ggagctgcaa aaccagcaac ggcaaagacg agcgcgatca 13080 caccgaagaa cagcccgcgc tcgcgcaacg tggtcagatc cgacatgatg atctcgcaca 13140 tggaatggat tccaccgcct cctagaccct ggataccacg accgatgacc agcacgagga 13200 acgtcttggc ggtggcgcac actacactgc cgaccagaaa gatgacgagt gctgaattga 13260 tcgaccattt gcgaccgatc gaacatgaaa gaccaccgaa gatgggctgc gaagccacca 13320 tgggcagcaa aaatgccgac gttatccagt tggcagcgat ggtcgacggc ggcaggttcg 13380 ccgtgattgc aggtaatgcg gtactgatca tggtcatgtc aagcgcagcg acaaatgcga 13440 tcaacatcaa ggccgagaag atgatccaaa aatgcaagtc cttcttcacc ttggcgcccg 13500 attcgttgtc ctcaacactg tccgtcaaag gtgtgctgga ttcaggctcg actctaacag 13560 cctgatcgtg cgaaaggctt tccttgtctt gcttggtttg agacgaggga taggcaggta 13620 gtgctgaagc attgttgctc gaactcagga gctcgggcga cgccgctgtc gagtagctca 13680 tcggcgtgcc gggctcctcg atcgaggtcc gtttctcgtc cgccattgta tacgatcagt 13740 gtacgatcct gttgcgagca cagaggcctc gatgcaaagt aggctctatc acgaatgaca 13800 aagaagccct cgatccaagt ctgccgcttg aagacagagt tgaagacaaa gtgttggacg 13860 atgtagatct gtaggtgtgg ctgatggttg ggaaggaagg gagagagcgt gcgaggacaa 13920 actttacagc tgctgttggg tcaactttgg aagcgcgaac tgctaaaagg atgaaaggcc 13980 gccgagatgt tcaagtgatg acttggtcgt gtgtgctgag gaagcgaatg agtgacgggt 14040 cggatatgga cgccctgaga ggagaggatt gggagctaag gaaacagcgg tgacccgaac 14100 aagattgttg tcgatacgtg ataacccgac agaagccacg ctaacacgat cgcttacact 14160 gggcacgtaa caaaggtgtt ggtactgtaa cactcttact gctgattcgt gatttggtct 14220 caagatgtgg attgttcacc acctggaggc acattgaaag aaggtcacag ccacggtagc 14280 agtcgactaa ttcagattct catcttcaca cagtcggtta gcgcgttatc tggaaaccgc 14340 agcccgaatc atgctgctgg cgaatcgtct cacacaggcg ccgcttggaa aggagcacaa 14400 tacgaggcgg gtcaccttgg cacattggcg aagccgttcg gaagtggagg caaagccttg 14460 gctgagtgtg ccatggcaaa cacaaaccaa agtacgaaca cgttttggtc gaacatcaca 14520 cacgaattgc gacacacggc agcatttgag ctgaatcggg cttcactctc gggtcgacga 14580 ttggtctggg aagacacaat gaacatacag acactgagcc tagcggcgaa gagacaatca 14640 cgaatgtaaa gcaccgtagc agaaggagct acaaccaaaa gagacaatca agcgagcgca 14700 agcgcataac cacttcttca acatggacgc agcccaggca gccagctctt caagagacga 14760 ccaagaagca gcacatgggc acacccgcac agacaacgac gatggtgcga atcgtgaatg 14820 tatcctgtga aaagatgcca ctttgcgaac tcgtgactgg tgagtattgc ttttgcacgt 14880 tatccacacg ttgactatac gctaagtcga aatcgaacga aagccagcaa cgacgtacaa 14940 tacacgaaag aatgaaatac tgtacatcgt ataacgctta gctcaaaccc acgatcagag 15000 acatgggggc taaccgtgtg aagatttgcg caaactcgac ccaaaacaag cacgaagcgc 15060 agacagctgc gtcgcttgaa tccgtgaatc acgaatcact gcgccacgcg aaaattggat 15120 ttcgggcgtc aatttagtct ggattttggt ttctggtttc tggattctgg attcgtgatt 15180 gtgcattttc ttgcataggc taagagcaat tctgaaaata caagtataat cacgaataag 15240 ttaacgtgta aaatgtacag tatactcatt atactgtgca tttgtctgat cggcgtggtc 15300 acgagttgct gattcggcta cctggagcgg cttcgcgtaa gaccgaggcc gatatgcaga 15360 cacaagcact accggcgaat tggaatccat gctagcctct cggacttgtg cttggatcgg 15420 tgtatcacca tgcacacaca gctgtggcct ggttcaacgt ccaagcgttt cgatcgtgtg 15480 tgatggacat ggcgcttgac ttgttgtgga tcaacagaca atggctggca agactggatc 15540 gcttgcattc acgattcatg aaaagcttac cgtgttggaa gtgtcgacgt cggagctttc 15600 tgcaaccaca tctgcagggg ggacacgggt cggggcgccc cacaacagcc gtgcatcgtc 15660 aaatgtctct gctttcgcat tcgcgttcgc gttcgccttc tcattcacag ccaaccgacc 15720 agcgtcgtcg gtgagagttt cagcgttcgc ttatcgtatt cgcgaatcac agatatgcca 15780 agctagcaac actgtccgag taggactcgc ccagttgttt tacgcttcgc ctagccgttg 15840 ccaacaaata aaccaggcgt actacgacag tggacaaata cttgctactg ctcgcttctg 15900 ttcgcgccgg cttgtgccga tttagaaggg gcggaattga atttttgaca agacgtcaag 15960 cgagctggta actttagatt cgtgattcag tctctgcacg tactgtactt ctcgccaaag 16020 attatgctgc atgtaccact atcaaagctg cttttttgtt ttttttttat ttattggtgt 16080 ttttgtgtgt ttcttgttct ctcgctgcag atctcaattt ctgttcctat tctccgagcg 16140 gcgattgaaa gctcactcct tcggaatttc aagcaggatt ctgagcgatg gacgcacggg 16200 tttcatactt tgcgccctct tctcttatct gtcagccgac cttggtttag gcgaatcgag 16260 tcgcagaaaa actaagtcgc gaatatacag taattcgtgg aaccaaccaa agcgtcaatc 16320 gttcaccgtt gcaaaataat cgtgattgtg acagaggaac cggagaaagc gacagaaaga 16380 cccatacttt gacgaggggg agggggagcg caaacgaaat ccaaccacaa acagcaagcc 16440 gaacaagaca aaaccaaacc aaacccaaat cgacaaacgt aaagtgccaa cagcgactag 16500 agacgagcaa acacaggagc ggtcgcatcg accgacgtag aagagaccga agtagaagtg 16560 gtggcggcgg ctgcggcgca aattcgcatc cagacaaccg tgcggtcaag ccactggttc 16620 atcttttcca cctcgtggaa cttggtatcg aagccaagtt gcaacttgat cctgccatca 16680 acattaaaca tgctgatcca cggagacgcg ttgatctgtc gaccggtatg cgtgaagtcg 16740 gttcgttcga gccacaagcc agaggcacgt gcgttggccg acacgggctc gaaacgtggg 16800 cgaatcctca tcatgccgac actcgaaaag ccaggcgctt gagcaccgat gatgccagca 16860 atctcttttt ggctgcttga tgagcttggt gactgtactg ccttgatttg cgaccactgc 16920 tggttcagtc cctggatgga gctctccgcc acgtaagggt aggacgagat gcagtgagga 16980 gtagtcaagt gtttatcaag ctcgtggcga atgtgacgcg ccgtagtcac aagctcttgc 17040 atcctgctga aactaggctc gagtgtcttg tgtgtgtcga taggcaccca gaggaagacg 17100 gcgtccgaac caagaggcgt cgacgccccc aaacttgcat cgagaggcgt cgcagagatc 17160 catcgtcgag cattgcgtac cattcccacg agcgcgcctt tgctcgaagc cttgcgctcc 17220 gggaacgtct ctgccgctga cagaatcgtg gctgctgatg cgaggtacgt gatacccgaa 17280 tgaacctgct tggcaaactt gagcagctcg cgaacctcgt ttgcttccaa ctccctacac 17340 aggcagacag tctcgtgctt gcgctccggc cacgaagttg aaggcatgag tgccaaagat 17400 tcacccatct tttctgtgat cattcgcatg ttgacttcgg caaccttgtg agcctttgcg 17460 atttcctcag gcttgggctg gtactgctgt cgatacgcca tcaccacgct gcgcggcatc 17520 cggggcaaaa cattgagctc gaaagcgctt ggcgtgtaga tcgagtcgta aggctccgag 17580 ggtgtagcat cgatgacgca ctgaaagagc gtgttaaaaa cttcggcaag gcttccgcca 17640 tccgtgacgg catgcgacac attccagatg gtgtacgcag tgcgttgctc gtcatcgagc 17700 ctgggaagaa tgaggtagag catcgacttt tttcccttgg tggcaagagg acgattgtac 17760 gtgtatgcgc acagttcggc cggcgtagcc acaccgggat cgcctagacg cttcaccaca 17820 aacgtctcgt cgagccactc ttgaatcgat tcttcgtcac gtagcaggcg gtagcacatg 17880 gtctgtgcaa ctgttgggtc cgtgtgcgtg tccaactcga cacccacttc gggccgcaag 17940 tacctcgcct ggatccaggc ctgtctaacc ctcgcgacca gctggtcaag ttccaactcg 18000 gaaccagtga tgcgcgatga actgggctga tgaacgcgcc agctcgtcat taaggaaagc 18060 tccgtatggc catgggcaat gttttggttg aacgaagcag aagcttcatg cccaaggcag 18120 gtacgtttcc aaagatctgc ctctacctgg tgccattcga actgatccat ctgctgcagt 18180 tgcacagccg tcctgaactc gtgttcaaga acatcgtcca gagactgttc ggtcgccgag 18240 gaaataagtg ttcggagagc gttgttgatc atgatgtctg cgatgacgaa ccggtagatg 18300 tttacgcaag gtcgatgagt gatggtggaa ggaacatgac ccatgccaaa gaggtagatt 18360 tgaaccttac agtggctctc gagagtttgc acttggagct tttagatccg cagtgccttg 18420 ctagtctaca ggctgtcgac gttgacatgg ctctttctgt gtactcgcag tgggaacagg 18480 catgcctcgc agctaactgc atttggcaca agtcgcagtc tttggatcgc ttcgcttcgc 18540 ttgtgcacgg cgcggccatt ttccagcgat agctagtcaa agtcagtgca agtgaaagtg 18600 ctaatctcca acgtgtatca cgactgaggc gtacatctgt atcgcttaca gatccagcga 18660 tacctcatgc ttggactttc atatgcagcg tctttgtctt gcatgccgtg gcataatcac 18720 gaaacaccga atcgtgaatc tcgaactcga actctgaaat gtcaaacgaa atcgtatatc 18780 tgaagagttg acaagtcaga gcgcgcaagt cctgcgtgcc gctacatgga tccgtctagt 18840 tcacgttatc gtgtatgagc gacggccggt aatctaaagt ggacgcacta gctcaaattg 18900 gcttgacctc tggcaacagc tctggtgcat gacgaggctt agctagtcta tttggaacca 18960 gactagcctg agaagctcta tcaagcctct gtttcagcca atcgtgaaag cacttaaccg 19020 aatatttatt tattcacgat tctttcgatt tcttttcgag tgttttatat ctgacacaaa 19080 ctcggccacg tctcatatgc ggtcggcact aacgccgtga gaaaataccg aatgtcaggt 19140 aattaagcat aaaaaaaatc ccaaagcaga acatgcagtt ggttacagct tcgtccaaag 19200 atcgtccata cgttaacgtt aaagcgcaac aagccgattc atgcaaccga tgcatatagt 19260 catttatgaa tacagtatgt agttttttta ttttattctg ggtggataag tcggcacaac 19320 ttgactcgtc gtgatcacct cattcgttgt tttgggctgc ttcggcttgc tcctcgtaaa 19380 actgataaaa ctgatgagcg acgagcaact acatctggcc tcacattggc gtactgtaca 19440 tagagactgg aacaggcggc tcagccaccg cagctcaagc aggtggtgca ctttgactta 19500 gagtttcgct tcaatctgcc aaactcagtg gtaatcttgc ggacagcgcc gagcgtgtca 19560 cctgcaatca ccacccagtg cagcatttac ggttcgtact ccgtttctcg acagattcgc 19620 atacatatct gcaaaaggtc gtttggctga tatgcaacca accagctcct cccttgtgcg 19680 ctcctctatt ttcccttgac tgaaactcgc tccgttgcta gtgcagtgat cacagacagc 19740 tattcagcaa agtgacttcg agattccagt tcccataaca ttaaagcggt cttggtttga 19800 aaaacaggtg acgaggaggt gtggatttca gctgaatcag ctgaatcgag tatatgcgac 19860 aactttgtgg tgacaaatat atgggggggc taatagaaca gtagactaca atgagtcgtg 19920 agtgtcgagg aaatgcatga gcggagcctg aatcttggaa tatgaataag aaaatcaatc 19980 gctcccaaca gtggaagcgg cgccagacaa catattcggg gtaaaagcag tctcgctctc 20040 cagatctgcc acggacgact tgcctgaaag cgtgtcgacg gcggcacgga cctccgaggc 20100 agctagttcg agctgcttct gttggttgaa cagcattgcc tgcgcctcaa tgagacgagc 20160 agcagcaggt gcaccgccag cgttacggct tcgaatcgac cacacctttg cgttgcgtcg 20220 aatcttttct gcctctgggc cttggagcag acgagtcatg gtattgacca agtcgttctc 20280 atcgatctta ggagcgtgct tgctacgcag cccaatacca aagcgctcgg cgaggatggc 20340 gtactcgtgc gtgtcaaacc actgcgaaag aatcatctgg ggcagagcgt agtggactgc 20400 ctcgttgaaa gagttaccgc cgccgtggtg gatcacaact cggagcgcag ggtgcttgta 20460 gatactgcgc tggtcgtgaa cccacctggt aaagcgcacg tactcgagac cctcaaactc 20520 agagagcgta aattgagagt atcccttgtc ggcatccgtc ttgcgtcgag gcgttacgtt 20580 tttaatcgca tcagcaagct tttccgagac caggtttgaa gctccgacag gacgcgcgga 20640 tcccatcacg gttcgctgct tctcttcaaa gtccggcgac gagatgggcg cggttggtgt 20700 cgaggctccc gtgttgttgg gcgtacgttt cggcttgttg agcttgaaca agagcttgat 20760 cttgcctccg ctctgctcgt acagtcgctc aaacgcgcgc aggcacgatc gcacttcggc 20820 gtcggtccac aaaaacatgc ttcccatgtt aaggtagacc accaactgac cagctgcgta 20880 cgcttcgtcc atccattcaa cgtccaagtc tcgtccagtc ttggatagcg gaggaaacaa 20940 cggcttgaac gtggtcgagg atgacggact caactgcttt aagaagctgg ctgcctcgaa 21000 taccggggtg tcgtggttct cgggatcgct cgtcacaccc gctccaacga aatgaatctg 21060 gtggggttgt cgaatcgagt ctgtgaggcc aggcgtgttg aaatgaatgc ctccaacgca 21120 gttgcggtcc ttccacagag gcggaacaat ggcactgtcg ccgttgaagc ccatcgagtt 21180 aagaccaaac tcgtttttca ggagcgcgcg ccgagcataa aggtccttct tggtagcatt 21240 gcgataggtt tcaataaagg tgagcttgag gttctcgagc gcggacatca acgttctctg 21300 acgtcgacga cccatgggat gtgggtcaaa caggctctta cgtgtcgctg tgagagatgg 21360 cgagcacgga acggtgaaca tccatggcaa gccagtcatt cggcaaccag tgaccaggtt 21420 gggggagagt gcatcaacag caactgcaag ccgaaccaag aaaatcaaga tgacagggtt 21480 agctacgtcg gtgcagggtg gcggccatgt ccgatgtcaa aagcgtactt actcatgtca 21540 ggttcgatct cgagcaagcg gtcgcggacc attgtggctg catctcgaaa ctcttgctct 21600 gttaccaaag ctacaatctg gcagatctcc atgctgctgt agtcgcctgg tggttttcga 21660 aggcccttga gatgtgattg catgcctcgt gtaaagtctt cgacggctcg tgcgttgccc 21720 agatccgaaa agtgaattcg tgctgccagg ataggatcat tctgctcgct cctgaattcc 21780 gcaatagcat tggcgaacga tgagccggtg aggaaggtta catcgtgtcc aagacggatc 21840 agctcgtacg ccgtcgctag gaggacattg atctctcctc gtgcagggtt ggcaaggagg 21900 gcaaccttca tggtagagga atagtgattc gaaacaggga gggggaggga gagagatggt 21960 gttaggttcc gagggcctgg tacacgacgg tgagtatata accgaaaagc cagaatggta 22020 tgggtatagt ccttggtacg tatgatgaga gatggaaatt cttgttcaga acaaatggat 22080 gtggcctatt ttatgcaaaa aacaaacatc aatggcagcc atatgttaag gaaaatggat 22140 aaggctgttg acagcgatgc agcaggactg cgaggagcgt acttctgata aaccgagcgg 22200 cacgcggcgt tataggggat gcgcaggaat gacttgttgt ctagctggcg aatctggtac 22260 tcagagcatg agagcagcaa gcaagatggt ggaagcgaga ctttagcgat tcaaggcgtg 22320 catgattctg agcgctcaac taggatagag agtcagttga taagcgtgga aggaaccgat 22380 ccaggattgc cgagcctcac ccacgctaac cacgacgaca aggtgtggca caaggagaga 22440 gcaagcttca gcacgttctg tatttctgag cacacgccgt tcgcaccgac aaggcgttgt 22500 cggagtctgt agtgatgcct ggtcaagatc ggtaaagtaa tcaagtggac gacctctgct 22560 tgtactgtat cactgaattc ctagcagaca ctaatgctga tagagaagac tagagatgcg 22620 tcaagcgaga acgaacacct gcttcgaaac acttcgcaca taactcgcgg acggaacagg 22680 aagtaaaccg tctcgacaac tcgcttcgga gcgccagcta agatgatagt agatagagac 22740 tcgtgaccgt gtttctgaac ttcctctaaa caaggatgtg aaagaatgaa ccaagattct 22800 catggcttcc aagaagggtc taggaggacg tgcataattt cgagctacgt ggatgcgcac 22860 acgagaagag cggctcaaaa gctatgtact aactactgcc gattggccat ccatgaacca 22920 ttcacgattc acgattatca gagacgtgcc acagcggcat tcgcgactct tctagcgagc 22980 tgttagtagc acaactaatg ccaatcagcc gaccagcagc tcggtccaga ctgccatctt 23040 gcaagtattc gttttgtttt cctttctctc tttggtgtat tccccccaaa ttgaccgccg 23100 ctgtgcatgg ttcagataag tctcgcaaaa ctaattcgtg aatgaagagc gaggaacgta 23160 ataagcattt tgtctcttgt ttggacaggt ttctgcgcca gacctgtgtc gtctacacgc 23220 aatttgccac ggcgttacaa atccacgatt gtgttattca caattaatac tgcactggac 23280 acggactgta ttttgtttgt gggcgcaatc gttaaggttc agccatacat cgggagcgag 23340 agttgggcgc cggcgctcct tgcgagcaca acgtacagta accgtaagcg tgaatcacga 23400 agtattgttt atttacttaa aaaatacatt cgtgattggg atcgtgattt tcacacgaaa 23460 cctcatggaa tttaaaaaaa aaccatttca agctaagaac cgagctagtc actgcgcttg 23520 taaaaaatct ccgacctcca aacgttgatc aatcacgaat ccagcatgca cgatttctgg 23580 ctcagttgga actctaacac gatagtgtaa aaagccacac agcgcaccag gcaccacgta 23640 cacagacaaa cagcatggca ttcacgattc acgagttacg atcctcgatc ctcgatcctc 23700 gatccacgat ccacggttca cgattcgcga ttgcgtttca gaaggccagg acgtgtgatc 23760 ttgacttcaa tcacgaatgt acagcactgc ggtataagga agcgtcaggc tgtggacggg 23820 caaccctaag ttgtagaaaa aggggcagca ggttgcgcac tttgcggaag tgggttgaac 23880 ctgtagccga agacggactc cgttacccta tcagccgtta cgcgcagatt ccgcactggt 23940 gtatggccag ctttgccgcc acaattgagg gtacttccac cactcgggac tgctcatacc 24000 gttggttcat tgcgcaacaa ctgccgcgtt gctcggcatc tgctcggcgt gctgcaggac 24060 tagtctttga tagaaacaca ggacgtgcga aaatcgcaag atctatcaca agacaaagct 24120 gttcaactca ggaatggtag catcacctat ggcactaaga tcacaccaaa cgctccagca 24180 ctcaagaagc aaggaagcca ggaacgtctt gttgaaccag ggctttgctg atgtccctcg 24240 aaacctaatc ggcgacaacc tcgattctca cggtcacatt ccgtccgtct gatgtaaatc 24300 aaatacaacg ctaaccgctg agcggtaaca atggaagcca tatggctgaa aagcgccgat 24360 gttgccgctc tactgcagta tctttacgat gagtgtccct ttttttaccg taggccggcc 24420 tacagtgcgt taggcttaga gctagacact agatgccgca cgtgctaccg ccagttctaa 24480 cacacgttgt cgctttagat tggtgcccac gctatcaaag ctctaaaatg caactcgttt 24540 cagtgcgcaa tcgttttcgt cgccgtgcaa gtcgtcattg tcggtcatta gtttttcgat 24600 tgggtatcca ctgcgaagaa ttcgatcgtg atcgagcttc atgcgatcgc aatctcattg 24660 accggttgtt gtcaactagt ggtgctcgaa gaggacaagc aaaaccatag cctccgatcc 24720 gctctattga taagtatgcc cccgttagct cgacgtcgtg tcataattca tcacgaaact 24780 cagtcatcat tgtagaactt agaggcaagc gcagcatacc agcatgaaac tgaccaatcc 24840 tggcaggtgg atcgggcttg aacccctaag aacagcgcgt cagaagcttg ctttggcgtg 24900 caaccgtaag accaaggttc cttcgagctt accgtccatg tcatcagggt tgagtttcgc 24960 gcttcggcac aattgtgtcc gtgcgctagt ctcacattca ttcgtggtag tcacagtcca 25020 agctggttct gagaaaaagg cctgtgagag attaccgagg aaaaggtgct gttgcactat 25080 tcggcaccta tctgcatcac ggcaaggtcg aatcgtgaat agatcgtgac ctggaaatta 25140 cgtgtttagg cctcaaccac acgagatttt tttggatccg tgtcattgat tcaactagga 25200 tccatcggtc tcgatggccg ctttcgcatc caagacgtgc attatgcgag aagagccgtt 25260 gttgagatga tcgctgggtc taaggatgtg caagccttcg cttaatacat ttacgctttg 25320 gcgatcttca tttctcggaa cgctaacaag cagcatgaag actcatgaat cataaatcat 25380 gaatcatgaa tcgttaatcg gcaatcggca atcgtgaatc gtgaatgtgt gcgatcgtaa 25440 attagctcaa gttaagtcaa ctagttgatc ggtgctcatt gtaatcacga atgttaagcg 25500 atccagtcat gcattacttt accctgtact cgtgactctg caagcgccaa cgagaataaa 25560 tgaaatgaga taccgattcc tgcgaaaata acgccgtttc tgggttgctt gcagcttttt 25620 cggtccgagt ctgtttcgga tttgtgcttc tggcgaggcg caaattgttc acattgctaa 25680 tcacgattaa ctccccgcga ttagtggtga gtgactcgta attttgcgtt agggtttcag 25740 gcttcgtggt caggttattc atcacgaatg ttagcgcgat cgcatctttc caagtcaatt 25800 ctactaaaga tcgtagcaat cctgaaatgc aaaccacagg atactgtaag aatgacacgt 25860 ttgtctatcc agtcttgaag gtgtgaaagg tgcagcaaag gctgcaaaac aaggttttct 25920 aggggccaca acaaacattc cagagtgtaa acgattttcc atccgacgga aaccagataa 25980 aaacagtcac aagtggataa taatgggctc gtgctacgta taataatttc tcacaagacc 26040 aaattcgtcg cgtgtttttt ccccgcgtat gtgcatggcg tttctatagg ttatttacac 26100 atcctctcgt tctaactgag cttagttggg gttttagaac ttctcgcgcg tacagttcaa 26160 gcaattactg tataggaggc ttaatttctt gtggaatcac gaatggcgca actaaacatc 26220 aagcaccgac gcactcataa acgttggtaa tcgttaaccc gtgtatccac gatttggctg 26280 gttacggaaa ctcaacccat tcggctaact tatgcgttat gcacaatgtg gtcgtgcaac 26340 cgcaacaatt tgtctcccaa gttaccccat tcacgattag tgcacggcgt gcttcgatcc 26400 aaccgccata gccttcattg gtacagcagc agcactggcc ttagagacca atcggctttt 26460 ttttggttcg aggctgacac ggcggattcc atgcttttgc cgttgttgat tctgtgcgga 26520 gatcgactgt gaccgggcca acttgatgat cctcgtctct gccatcgtaa aggcggcatt 26580 cgtgattacg atctacaacc ctttattcgc cccgtcttca tttcgtgggt tggtcaagtc 26640 gtctcaaaac ctgtccttcc actctcacct ttggatatat ttctcacttt ggatatattt 26700 ctcactttgg atatatttct cgcctttgga aatatttctc acctttggat atatacgaca 26760 tcttgcttgt acctttgttg tagcacaaat cacggttaac atgcaggcag ataaacatgc 26820 ttggaaggag gtctcgccga accttgctca gcgaccgctg gttggagtcg aaaagttgct 26880 caactacgca gaatattatc aaaacggcaa tgtaagtgca acaaccgatg gctctagtcg 26940 atacgcctgt cacttgtcac tttgaagctg actggttctg gtctttggta tgcttgtctt 27000 gcccagttcc aactgtcggc tgccattcat ctcgagacaa atttgactac tgaaaagcta 27060 caacagcgct tcggcttggc tgtgtggaac gtcagatgtc ttctgccaga gatcggcaca 27120 tggaccgtgg gtacaaccgg tgatgctgct gtggacctcg acaacgccac tttcaccgcc 27180 atccaaaccg tcgaggaggc tcaaaagtgg atcgacgaaa ccgctgtgat agttcaagat 27240 ggaaccacag ctgaagaatt gacaattctc accaccaacc acaccatcga gccggctggt 27300 aagcagtttc gtgtctatct agtcaccaac gcgcgccaag gcagccctgc gatcatcgtc 27360 aatgcctctc acgttctcag cggtcatcgc atagcggcac aactctgtac catcgttcaa 27420 gccttggtag atgccaggct cgtgtcgctc ttgcaagctg agccagaccc tcgcgctgcg 27480 ctcaggtcta tctttgtccc cgaaaacctg gctcgactaa tcggcaagct gcccatcagc 27540 cttaacacgg cgtatcacaa gagattcaac cccactgagg gcgacttgga aaatggcttc 27600 gagaagctga gcgaacgtct tgccaactcg gcattgccat ccattggcat tcctcgattg 27660 tcgtccccgg caacaaatcc ggaatactca ctcggcacgg tcaacggtga agccatgacc 27720 atcatgaatc tcaggagaac catcggttcc aacgagtatc ggttgctcat agatgctcac 27780 aaaaagcttg gtatcacggt tccctctgtt atctatgcct gcatcgtcaa cagcatcgat 27840 cgacgctgca agtcaaacac ggcgcaggat gccgaaacgc ccggggcgaa tctggcctat 27900 tcggcgcatg ccaaacgatg gttgccagac gaaaccttca tgactcgctc tcccgtcaac 27960 atggccatcg ttctcgggtc agcttacgtc tcgcccgacg agctgcgatc gaagcagcac 28020 gggtgtgatc tgagcatcga tgagctgatc gagcttgcaa agaccattcg cgccaagcag 28080 gatgcctacc tcgacactcc acacatcatc tcagcgatgg agcaggtagg cgatgaagtc 28140 tctgccatga ttgcggacac agccatcaag cagcgtcaag ccggcacgga tccacacgta 28200 gctcttttcg agaactcgcc cgccatttgc ccaccaacgc tcacatcgca gggcgacatc 28260 gccttcaaga ggctctacac tgctcagggc ggctcatggg atcctgagcc agctgtcgct 28320 gcaggagagt acgtctacat cgggcacagc tggaacagtg gccgaaccac tgacgctagc 28380 gtgtgctttg ccctggttgg cttctccggt gagctgagac tcacaagtta cttcgattct 28440 cgcttctttg acgccaagct catcgaaagc atccttgatg acgtattgtc aaaccttcga 28500 atgatcgcca ccacagtccc cgttgacgct cccgaggcca agctctagtc actctgcgac 28560 tgttttctgt tgtgttgtcg tcgcttcgtc cttttgccca ttgctgtcac tctcagcgcc 28620 cctttgcagc tggtctttcg taggtttgaa ttttggcttt tggttggtac tcaaccgcaa 28680 tgcgtgtgcg aacgcgtttt ctgctttgtc acctccttaa cactcacgac tgacactgtt 28740 gaattatcag aaagaagcag cgatatttct gcaagcaagt gagatggttt ccagagtata 28800 tatacctggt ctggtcgcta ggtgtctacg gcaggcatgc ttgtgaattg aatctcatgt 28860 tgtcgaatga atccgccggt gaagggattc ctggcttcca ctcacgacta atcgggcact 28920 aatcacgaat ttcggtaaac cccgatcggc gatcgctgtg atctttgatc ggtcctgagc 28980 tcagtcgagc ttgattccag atcgccgtga ttcacgattg gtcctgggcg gcttcatcgg 29040 ttggaagcag tcgtgagtgt tcacaattat atcttctgct gactcaaaca actgtgttgc 29100 aaagcgtcga ggtcaagtga caccggcaaa atatgcattc tacctttagt tgcacactag 29160 agctggaggt cagcaggagt aaggcgagga gtagggcaag tacgaatgtt tactatatga 29220 ataccaaggt gggtcgagat tggttcgctt gtgcactcga ttcctcccgc gtgttgcgta 29280 tcgagccttg aagaacgcgg tgagccatgc tgtgttgggc tgatcgtcgc aaaagacggt 29340 atgagatatg agatgcgtaa ggatttctcc acagataaga tagatatgct tgctaagtta 29400 tcgttgtttg aggtcagact gctcgtctag ccttcctttt ggtgcaattg ttttcctcaa 29460 ctcccatgtc tcctgctgaa acatgtgctc ctaactcatc cgcatccctt caccgcaatc 29520 gattcatgac gagtgaccat gcttcgaaca tgatgatcac catcttgcgt tttgtttctg 29580 cccacagttg ggagtctccg ttcaacgtgt ccgcatcggc aaaagcagcc ccacattgac 29640 agaaagggtc aaggtgtaaa ggcttaaagt ccccaacttc ctggagacca ttcctaatgc 29700 aatggactat cggtggtgtg ccctactaac ccctacggcc gggtcgatat cggagtcgat 29760 atcggagtcg ataacgtcat ccgcaacatc aatggcagcc tttcttcaag ccccaagaca 29820 tgctttgtgc ttttgtactt caaagctgct ctaaggcgct accgttatca cgcaaacctg 29880 tgtcgaaact gcaatgagcc gcgttcaatg ctgtcaaggc agccatcctc attggcaacc 29940 cacgcaccag cctgatctcg cctgtactat cgattgccgg tggagaaatc ttagttttgc 30000 agccataggc gctctttcgg ccctctacaa gagagtaccc gctgattggg tagacacgac 30060 ctaggagacg tgcatcttcg cagccatacg cgccctttca accctctaca agagagttgg 30120 gtagacggga ccaaggagct gtgtatcttc ggcgatggcg tctgcgttcc caagcatcgt 30180 gtcggtattg cacttcaccg catccaacat gctctgcgtt gctccgtcta gcgcatcggt 30240 tctcaaattg ccatcagact gtacagtgat tcaaatattg aaatcacaga ctgaacagga 30300 ctgccctcgg actatgtatt gatatatatg gaacaagaca aggtggatgt gtgttacgcc 30360 tgaaatggag agacgaacag actctgagtg gtgtcggctt cgtccctgta tatccaagct 30420 gcgaaagagc tttcacccgc aagaccacct cccctcgaag tctgccaaat gaaccaaaac 30480 ggtatctaga cactcgatcg gacaggactt gccaccttga actcaaagta gtcacagaca 30540 tccaagacgg atcatttcaa ctaaccattt cattgaaaac aaagctccaa cctgaactcc 30600 atcacacgcc acctgcacct gcacctgcgg aaatccgcat tggtggcgtg ttgctcgaca 30660 atgaaggagt gcgacatatc tgtgcattct tttcgctgcc tgctgcgtcc gactggtgtg 30720 atacgccaac cgcactcgtg tcgttccgac tagcagcgtt ggaaaaatgc gtgatatacc 30780 tgctcttgac tcgctggcct gcagtgtgga aaaaggcttg gtgctttgac ttgagtttga 30840 gggcgaattc gttttggcca atgtcgggaa gatacttttc cagcacgcca ggatcgagga 30900 ggatgggaaa gaggatgtgc atagagcaca acaagacagg cgggacgagc agctggaacg 30960 actcagctcg ccacttgagt gggataagcg gcgaagaagc tacaaccaca gtgactagga 31020 aggcaatcat gaccatgtac acggtccaga agcgacgaag cacttctttg agcgacgatt 31080 gcttgaaatc cttgttggtg gagccccatg tcatgttgta gccaaacatg tgtgctgcga 31140 gtgccgtcat cacgtggaat gcgaggccgc tgaaaaagac tgccattgca ggcgcatgac 31200 caatcgcttg caaagcagct cgtccaaatg aggcgtggcc agaacgagca cgagcagtca 31260 ccagacccag acctccggcg agcgtgaata tcaacgtgac cgtgataaga acatcgaagg 31320 aaggcaggaa aatcggatca aggtacggca ggaacaagcc ttggatgaca aagacgacaa 31380 ccgtcacggg gaacgagatt gccaaagcat agtaggtcga caggtaggag agcgtggcga 31440 tcttgtacga agtattgacg tcacttgcga taaaagtgcg aaagagcttg ctgagtggac 31500 cgcgagtgaa ccaataccga agtgggttaa aaacgatttc tgagcagccg tacgcatatc 31560 tgttacaaca gatggaggtc agtttgtcat ctctaaaaag cgtgcagccc aacaggttta 31620 cttacttttg aaagcgtgcc gtctcatcct cgggtgtgaa tgagacacct tcggtaaagc 31680 cttggtccga gtaagtcgcc catcgaatca cgtagccttt catttgcaac ttcagcgcca 31740 tttcgaagtc ttcagaaacg tgatgaggcg aaaagaaggt gcgttcgcca tcggtgtcca 31800 gattggacac ctcctgtaca gctttccaac gcagaaaaac attgtgtccc atgagcggcg 31860 ccatcgctcc gttcgcgcaa atccaggaga tactgaaatt gacgcaagct gaggtaaaat 31920 agccgataaa ccgttcaaag tagtgattct gcacgtacat cacaccggag caatgctgca 31980 gagcgccaac ttcgggacat cgttccatct cagccgcagc ttcgatcaag cagtcgaccg 32040 gcatgcgtgt gtctgagtcg atcagcagaa tgtagtcgcc gatgcggatg ttgccagatg 32100 cccatacacg accgttggtc tcttcgacag cctgggcaag tgcatcgcta tatcgttgct 32160 caaggtcgcc gtcgatactg ctagtttcca gaatctcctc ggtgcgtaga gaaaggttgt 32220 aggtgaagtt gaggttggat gctttcttga aacgaccttt acgcacatag cctccttgct 32280 cgctgctgtg gcgaggacgt gctacccatc cgacgcccat ggcattgtag tatcgaatac 32340 gctcttcggc ctcttcgcga ggtagcaaac ggagaccgtc ctcagatacc acaatgttgg 32400 caacgcctcc aaccgcgcgg tatgcacgga cggcgtccat agcgctttcc aggcttggag 32460 caaggacaga cgacaagccc tccttgtagc agggcatgtt gatcgtgaac tgaggcagga 32520 cgtggtccgg aggaagaggc tcgctagcct ggccgctgta gtacaacgag ttctcgtgca 32580 tttgtcggat gggaaataag acttgagcta agcacgccgc caagttctcc cacaggaatt 32640 gcgagaccaa catgaagaac ggcaatagaa tgatgaaggc gaatcttgcg tagttgccgt 32700 ccactaaagt ttcaatcacg attttagcaa gaaagccaga catcatgaag acgtttaggc 32760 ctaagcccaa gccggtcacg agcggagcga ggtaacgaac tttgcgatcc tgctgcagct 32820 tcttttcgcc ggtgatagga tcgtcagtga ccacaggact tggaagaatg ggtttctcgg 32880 ccaagatgct ggcatcatca tcagaaccag tgttgttgcc tatgctgtga cttgatgaac 32940 tttcgtaagc acgcagggag ggcacaaagt agtgctgcgg cacctgttgg ctgtgttgag 33000 tgaaaattga ggccggcgtt aatgctccgc cgctgatgtt atcgctgtaa gatggccttg 33060 tgagtacgcc cggcggcacc gttgttaaag acggattgct cgaggaagtg atgctgtgct 33120 gtcccatgcg agcagcaagg aagcgaaaga ttcgagcatc aatgtcgcgc gccaaatttc 33180 ggagaagtcg cgggttgctg tggaacaggc ccagtgtact ctcttggcgc aggacgaact 33240 gtaaaggtgc ttggacgctg ttgttcgcgt gggtatagaa atcgatggca ctcttcgcgt 33300 cagcaaagat cgaaattgtg ttgatcatgc cacgacgacg gctgaggtaa ataacgtcgc 33360 tgctgtcgct ggcctcgtgt tgctggagca aagtcgatac gattgggttc gccgccaaca 33420 ccagacaatt ggcggacgag acattcgctg caatgtctga cagtgcatcg cgccatttcg 33480 acaaactagc cgcgttctcg gtcggaaaga agccatcagc gaggataatg ccagatgact 33540 cgttggcgct gatccagcct tctttgatgc cagtcgaata gagcgacgaa acttgctcca 33600 tggttgctat aatgctcatc gtgatgatgg aagtggcgga caacaatcag aaccgtcgtt 33660 gttctgcctt gcttaagagg cggcctcgtc caggctcacc ctctgcttta ccagccagca 33720 cctgccgcat caagcactac agctcaactt gactatcggt tgatctcctt ttcctaccgc 33780 aatgcaccgg tcgtgtcagg cacgacagca tacgatttta tattggtgat ttctctcgta 33840 tcctgacaaa tgcacgaggt cgcaggtcgc aggtcgcagg tcgcagaacg cgggacgttc 33900 ggctcgggtt accgcatcaa acccaaactg tcaaaatacg cggtggcaat cgtgaatttg 33960 taaattcacg atttgtgcaa aataaaactc agtgactgtc gaactatgcc cacgtaggtt 34020 tgtacaccac agtcacgagt ctccaaatca accaaaactc aggaaaataa attcgtattc 34080 ttggaacaca cgggaacgaa cactccctct ccgcacaaat ctgtagatcc cagtcacgat 34140 ttggtgagta gccaacaaga gtcagcacta tataactctg aagctctggt ttgcctaagg 34200 cggtcgtctc cttgtacgaa cccgcacatt tgatcatcat gctgttcccg cagaaacaga 34260 agagcaaagc acagctagac tcggtcttgc cgccttcagg cagtcaggac aacgtgtcgg 34320 cagagactaa gagtagcgac gacggccatt ctactgactc gcgacctgtc acaatagact 34380 gtctgcccat taccaaggcg gcgcaacgcg accttcttca accaccgcac ggctttgtgt 34440 cccgacgcat tgacattgac acttggacac cgccaaatct atggagtttg tcgaagcgag 34500 gaaagcgcag caagcgcatg gattacctat tcgttctgtt tggtatccta gtcggcgttc 34560 tcgcctccgg ggcagtcatc ggactgggaa tcttcagctt catgagtgat cagcacaagt 34620 actgcctcgc tttagacgag cagtttgacg gacctctcaa caccaacatt tggtcccgtg 34680 aagtccaggt tggtggcttt ggtaacaaag aattcgagtg gaccaccgcc agctcgaaca 34740 actcctacac caaggatgga aaactctaca ttactccaac gctgactgct gacgcaatag 34800 gtgaaggcgc cgtcacaaac ggctatacgg tcaatctcac gcagtcgggc gaatgcactg 34860 caccacgggt gccctggatg gaccgaacgg acagcgaaga tattcgaatg gcctcgatcc 34920 gcaatgctga tgtcaactgt cagatcagca gcaactcaac attgggaact gtcatccctc 34980 ctgtgcagag cgcgcgattg accactaacg caagcttctc tatgcggtac ggacgagtag 35040 aagtacgagc gcgcatgccc accggtgact ggctgtggcc tgcagtttgg atgatgcctc 35100 gcgacagcgt ttacggcgag tggcccaaga gcggcgaaat tgatctgttc gaaggcaagg 35160 gcaatcaggc gcggtcgcgt accgaacagc tcagcaacac catgcgaagt gcgcttcatt 35220 ggggcaacga tgccactact gaccagtatc tcaagacgaa tcaagtgtct accctttggc 35280 gcaatctttg gaacgatgaa ttccacacct tcggtctcga gtgggacgag cacgggattt 35340 ggacgtggcg tgatagccgt gcgcgacggg tgctcaatgt ccgcttcaag gaacccttca 35400 tcaaccagat gcccaaggtg cagctccaaa atggaggcgg tatggtacct gcgccgaacc 35460 cttggagcaa aagtaccaac aacggtgcac ctttcgatca agacttctac cttatcctca 35520 atctggccgt cggaggcaca aacggctact ttgccgacgt aggccagcca tggagcaacg 35580 acgacccgcg cgctgctgca actttctgga gccagcgtgc cagttggctt cctagctggg 35640 gctcggtcga gaagcgatcc atggtgatcg actatgtcaa agcttggaag cggtgctaaa 35700 gtttttagtc tatctgctag gttgtgttgt ttcggtgggc actttcaagt acataccttt 35760 cccgccgtgc agtgtaagtt gacgatatac caaaactagc ttcccaatca tgagtcgtgg 35820 gtgtgcgtcg tgagtgaggg agacacgaga gtcgtgagtt gtaagtcgtg agtggcgtcg 35880 cgggttgact cgtgactcgt gtgactcacg actccgtgac ttgcgtgggt gcacgccgca 35940 ttgagaaaca tttggcgatg taacatcatc aattttcacc gtacacccac gtcgaccttc 36000 caagtggcaa agcatcgttg ctctgaggcc caaaggtgta tgtgtagttg cgtggatcgc 36060 aagggctggc gctctctctc gtttctgccg atgacctatc cccaagccac ccgcactgta 36120 cggtaaagga ctcgtgactg cacatcccac aagcagcatc gtcgttctca agcacgccac 36180 agtcatgagg tgagttgtga gtcatgagca actttttgtg tgctgaatta tcgacgcgaa 36240 ttcttatctt cagctacagt cgtgagtcat gagtactgat acaaaaatga tataaacaag 36300 agacaaatcc catgacatcg gcaccaatca cgaatgagtc tggtagctat attctaacca 36360 caacgctgcc acatcttgac gtaatctaca gccataccac gccttttggg atcattcggc 36420 caagtaggtc cccaggtttt gtctttggct tcgataaagg cacgctgagc attagggtcc 36480 gaattggacc agggaccgtt ggtaaagtag ccacccgtac caccgacggc aacattcatg 36540 atcaggtaga agtctgcaga aagagggaaa gtgtgagctg cttgctccaa tgtatttgct 36600 tgtctacgtt gtacttacac tgatcgaagg gggcagcctt gttgttcgcc gtcagccacg 36660 gattcggcgg cgggatgatg gcgcctttgt catcgcgctg gatgggctgg cgatcaatgg 36720 caaaattgtg gaacttctta ttgaggactc tatagttgcg tgtgttttcc catgtccaga 36780 tgccgtgctc gtcccactcg agcccgaatg tgtggaactc ttcgttgaag tactttcgcc 36840 agaggaggcg aatccccctc cagtggctct gtagattgtg agcagcatcg atgccgaaat 36900 gtagtgtact gcgcactgtg tcagaaaaag cgagattcgt gagtgtcgac attcgccgcg 36960 gcttggtatc accgtccctt tctactctgt gacttacttg cgttgctgtg cacatcatat 37020 cgtgactttg gctggtttcc gtcgttttcg aaaatgtcga tttcaccaga tcgcggccat 37080 tcaccgtaga cactgtcacg aggcatcatc caaactgccg gccagagcca gtcaccagtg 37140 ggcatgcgcg ctcgcacttc gacacgacca tagcggatag cagcagttcc ttttgttgtc 37200 aagcgcgcac tctggatcgg agggattaca gtgcccaggg ttgcgttgct gcgtgcaata 37260 cagttggtgt gtgtgctgtt gatagccacc tgcataggga cctttttgtc tatcttggtc 37320 cacttgacct cgttggttgt acaggtgccg tcggcggtca agttgacagt atagccgtcg 37380 tacacgttgg cttctccgat agcgtcggca gtgagcgtgg gcacaatgta aagcatgccg 37440 tcctcggtat acgcgttttt tggcgagtct gtggtccatt ggaactcgcc gttgccattt 37500 ccgcccaccg acacttctgg aacccagagc gacgtgttga gcgggccatc gaagtgatcc 37560 tccagtacaa gacagtattt gccttgttct ttgaagtagc tgaacagctc caacccgact 37620 cgcaaagtgc ccagcaacag gccagcgatg atcccgatca tcagcaccag acagccatgc 37680 ttgcgcgacc gtcgtcctgc gtggctttgt gtccagtaga gattgccatt gtcctctttt 37740 ttgcctttct tgatgctctc ctcgacgatt tgcgctggag ggatgcgtgt actgcgaaat 37800 tcggatgcac cggttggtcg tcgcaacatg atgacaaagt acaaggcggt gtgtgttggt 37860 gaggggcgag ggagcggagg ggggttttgt cggtgcgaaa gacacttttt gtcattcttc 37920 ttgtacagag gtcccccatc atactgggag catggctcct gctacacttg caagagctac 37980 gattcgaaca catgtagcat agactcacag cagcctgtca cgcacttggc tgtttactct 38040 acccagcctc agcctttttc gcgacaaaat aaatttccgt atcgcgggac tcaccactca 38100 cactcttttc ctgtgagcct ggagcaccaa catacgagcg ccgtatatcc aaagaaataa 38160 aataaataaa atcgtatttt ccaaatattc acgattcaca gcttcacgat cgtggattca 38220 cgtttcggaa ttgtgtagtt tggtgtcagt acccgttgca tattgcaaaa cccagcgccc 38280 catggcttgt tctatcgtac ctttccactg tgtcattctt ccggtgtaaa ggggccgttt 38340 cgcaccactg caaagcgcag agaggaaggg atggtaagcg gactacaagc gagaagctca 38400 gctgggtctc cgtcgctatg ccccctgaaa gatatgcaag gagcatcagg ttggggccgt 38460 acaaataggc ctgtgcctat gccgttgtat gatgcctctt cctatatttg agccccccat 38520 attcctgcaa atagatgcat gtctctgc 38548 //

gamcil commented 3 years ago

Hi @Stef-cr, could you try uploading your files as text files - when you paste the contents like that, it breaks the formatting (GenBank is picky about the spaces, which GitHub deletes) - then I can have a look. GenBank files generated by Artemis should work (see #7).

Stef-cr commented 3 years ago

Thank you for your prompt response @gamcil , I've attached (.gbk, .fasta and .gff with a .txt extension) for two organisms as a test to see if you can spot my mistake, all files have been created using Artemis.

I also ran $ clinker file1.gff file2.gff -p

However the plot ends empty (no features only organism names).

Thank you!!

UMA_gbk.txt UMA_gff.txt UMA._dna.txt

PHUB_dna.txt PHUB_gbk.txt PHUB._gff.txt

gamcil commented 3 years ago

Okay I've had some time to look at your files @Stef-cr and noticed a few issues with them, but with some minor editing they work fine.

Firstly, the DNA files seem to be in EMBL format and not FASTA. clinker expects something like:

>scaffold_name
ACGTACGTACGTACGT

Where scaffold_name matches the scaffold ID in the corresponding GFF file. You can pretty quickly convert EMBL to FASTA just by opening the file in a text editor and deleting everything but the sequence and adding that header.

However, the other issue is that the name in your GFF does not match the one in the DNA (NC_026478.1 in UMA GFF but NC_026484.1 in DNA, NW_012133783.1 in PHUB GFF but NW_012133821.1 in DNA).

Also, I'm not sure if it's just GitHub renaming your files, but make sure your GFF and FASTA files have the same file name, just with the different extensions (e.g. UMA.gff and UMA.fasta).

Once you fix these problems, clinker runs as normal: image

Stef-cr commented 3 years ago

Thank you very much @gamcil !!! Although I have a different error now:

(clinker) MacBook-Pro-de-Oscar:FIGURE_Artemis oscarmariomolina$ clinker UMA_mel.gff PHUB_mel.gff -p [11:57:40] INFO - Starting clinker [11:57:40] INFO - Parsing GenBank files: ['UMA_mel.gff', 'PHUB_mel.gff'] /Users/oscarmariomolina/anaconda3/envs/clinker/lib/python3.9/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. warnings.warn( [11:57:40] INFO - Starting cluster alignments Traceback (most recent call last): File "/Users/oscarmariomolina/anaconda3/envs/clinker/bin/clinker", line 10, in sys.exit(main()) File "/Users/oscarmariomolina/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/main.py", line 208, in main clinker( File "/Users/oscarmariomolina/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/main.py", line 81, in clinker globaligner = align.align_clusters(clusters, cutoff=identity, jobs=jobs) File "/Users/oscarmariomolina/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/align.py", line 51, in align_clusters aligner.add_clusters(args) File "/Users/oscarmariomolina/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/align.py", line 325, in add_clusters self._genes[gene.uid] = gene AttributeError: 'NoneType' object has no attribute 'uid'

I´m not sure I understand how to fix this, I read your code but could not get a proper grasp of it which could apply to the error I'm having. Can you spot what might be? Would it be possible for you to attach the fixed files you used to create the figure you showed, so I can try and compare mine to yours? this might help me to fix mine files. Sorry if this is a silly problem and I'm lagging it.

gamcil commented 3 years ago

Not quite sure, I think this might be a bug that has already been fixed - could you try doing "pip install --upgrade clinker" to grab the latest version?

I'll attach the files to this comment - they still work for me using the latest version of clinker.

PHUB.gff.txt UMA.fasta.txt UMA.gff.txt PHUB.fasta.txt

Stef-cr commented 3 years ago

Thank you very much!! it did worked!! This software is a piece of art! Good job!