dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
72 stars 39 forks source link

Step 2 fails due to non-finite values, ipyrad log not updating #310

Closed carlyraea closed 4 years ago

carlyraea commented 5 years ago

iPyrad is failing at Step 2

Step 2: Filtering reads Skipping Sample orig_cladonia_demux_Cladonia_sp2; Already filtered. Use force argument to overwrite.
Skipping Sample orig_cladonia_demux_Stereocaulon_rivulorum; Already filtered. Use force argument to overwrite.
Skipping Sample orig_cladonia_demux_Cladonia_cervicornis_verticillata; Already filtered. Use force argument to overwrite.
Skipping Sample orig_cladonia_demux_Cladonia_caroliniana2; Already filtered. Use force argument to overwrite.
Skipping Sample orig_cladonia_demux_Cladonia_caroliniana3; Already filtered. Use force argument to overwrite.
Skipping Sample carly_justin_demux_Cladonia_krempelhuberi_redo; Already filtered. Use force argument to overwrite. [.....]

Then it processes reads for a moment before giving this error:

found an error in step2; see ipyrad_log.txt found an error in step2; see ipyrad_log.txt found an error in step2; see ipyrad_log.txt found an error in step2; see ipyrad_log.txt found an error in step2; see ipyrad_log.txt found an error in step2; see ipyrad_log.txt found an error in step2; see ipyrad_log.txt found an error in step2; see ipyrad_log.txt

Encountered an unexpected error (see ./ipyrad_log.txt) Error message is below ------------------------------- Cannot convert non-finite values (NA or inf) to integer

I'm not sure what the error is, because the ipyrad_log generated is empty. I'm also not sure where the non-finite values would come from, unless they are Ns in the sequences themselves...can't see why it wouldn't be able to handle that though. Here's my params file for reference:

[caan8813@shas0137 data]$ less params-data8.txt

ddrad ## [7] [datatype]: Datatype (see docs): rad, gbs, ddrad, etc. AATTC, TAA ## [8] [restriction_overhang]: Restriction overhang (cut1,) or (cut1, cut2) 5 ## [9] [max_low_qual_bases]: Max low quality base calls (Q<20) in a read 33 ## [10] [phred_Qscore_offset]: phred Q score offset (33 is default and very standard) 6 ## [11] [mindepth_statistical]: Min depth for statistical base calling 2 ## [12] [mindepth_majrule]: Min depth for majority-rule base calling 10000 ## [13] [maxdepth]: Max cluster depth within samples 0.85 ## [14] [clust_threshold]: Clustering threshold for de novo assembly 0 ## [15] [max_barcode_mismatch]: Max number of allowable mismatches in barcodes 0 ## [16] [filter_adapters]: Filter for adapters/primers (1 or 2=stricter) 35 ## [17] [filter_min_trim_len]: Min length of reads after adapter trim 2 ## [18] [max_alleles_consens]: Max alleles per site in consensus sequences 5, 5 ## [19] [max_Ns_consens]: Max N's (uncalled bases) in consensus (R1, R2) 8, 8 ## [20] [max_Hs_consens]: Max Hs (heterozygotes) in consensus (R1, R2) 4 ## [21] [min_samples_locus]: Min # samples per locus for output 100, 100 ## [22] [max_SNPs_locus]: Max # SNPs per locus (R1, R2) 100, 100 ## [23] [max_Indels_locus]: Max # of indels per locus (R1, R2) 0.5 ## [24] [max_shared_Hs_locus]: Max # heterozygous sites per locus (R1, R2) 0, 0 ## [25] [edit_cutsites]: Edit cut-sites (R1, R2) (see docs) 0, 0, 0, 0 ## [26] [trim_overhang]: Trim overhang (see docs) (R1>, <R1, R2>, <R2) p, s, u, n, v ## [27] [output_formats]: Output formats (see docs)

[28] [pop_assign_file]: Path to population assignment file

Any help is much appreciated!

carlyraea commented 5 years ago

Sorry- initial params file got cut off: data8 ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps /scratch/summit/caan8813/data8 ## [1] [project_dir]: Project dir (made in curdir if not present)

[2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files

                           ## [3] [barcodes_path]: Location of barcodes file

/scratch/summit/caan8813/cladonia_all_demuxed/*.fq ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files denovo ## [5] [assembly_method]: Assembly method (denovo, reference, denovo+reference, denovo-reference)

[6] [reference_sequence]: Location of reference sequence file

isaacovercast commented 5 years ago

What version of ipyrad are you running? ipyrad -v

That params file looks like it's from an old version, so if you're not on 0.7.28 I would suggest updating and trying again. conda install -c ipyrad ipyrad.

You can also try running step 2 with the -f -d to enable debug mode to write more info to the log file.

On Fri, Oct 5, 2018 at 5:58 PM carlyraea notifications@github.com wrote:

Sorry- initial params file got cut off: data8 ## [0] [assembly_name]: Assembly name. Used to name output directories for assembly steps /scratch/summit/caan8813/data8 ## [1] [project_dir]: Project dir (made in curdir if not present)

[2] [raw_fastq_path]: Location of raw non-demultiplexed fastq files

[3] [barcodes_path]: Location of barcodes file

/scratch/summit/caan8813/cladonia_all_demuxed/*.fq ## [4] [sorted_fastq_path]: Location of demultiplexed/sorted fastq files denovo ## [5] [assembly_method]: Assembly method (denovo, reference, denovo+reference, denovo-reference)

[6] [reference_sequence]: Location of reference sequence file

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dereneaton/ipyrad/issues/310#issuecomment-427510047, or mute the thread https://github.com/notifications/unsubscribe-auth/AFsrv_CpG4jpSib0cYsDmJlK0xCSiGKhks5uh9YfgaJpZM4XK-5S .

carlyraea commented 5 years ago

Hey, thanks for the quick reply. I'm on an old one so I had started updating and will try rerunning right now to see. I'll try debug if that doesn't solve the problem.

carlyraea commented 5 years ago

Hello- the updated iPyrad shows the same error, although it doesn't crash the program now. It also doesn't cite the non-finite error. This is the new output error output:

found an error in step2; see ipyrad_log.txt found an error in step2; see ipyrad_log.txt found an error in step2; see ipyrad_log.txt found an error in step2; see ipyrad_log.txt found an error in step2; see ipyrad_log.txt found an error in step2; see ipyrad_log.txt found an error in step2; see ipyrad_log.txt found an error in step2; see ipyrad_log.txt

Step 3: Clustering/Mapping reads Sample not ready for clustering. First run step2 on sample: orig_cladonia_demux_Cladonia_subtenuis Sample not ready for clustering. First run step2 on sample: orig_cladonia_demux_Cladonia_confusa Sample not ready for clustering. First run step2 on sample: carly_justin_demux_Cladonia_pyxidata_redo Sample not ready for clustering. First run step2 on sample: carly_justin_demux_Cladonia_gracilis_elongata_redo Sample not ready for clustering. First run step2 on sample: orig_cladonia_demux_Cladonia_rangiferina Sample not ready for clustering. First run step2 on sample: carly_justin_demux_Cladonia_coccifera_redo Sample not ready for clustering. First run step2 on sample: orig_cladonia_demux_Cladia_retipora Sample not ready for clustering. First run step2 on sample: orig_cladonia_demux_Cladonia_sp5 Sample not ready for clustering. First run step2 on sample: orig_cladonia_demux_Cladia_aggregata

In the iPyrad log there are a ton of errors related to malformed cutadapt files, like this: This is cutadapt 1.16 with Python 2.7.14 Command line parameters: --minimum-length 35 --max-n 5 --trim-n --output /gpfs/summit/scratch/caan8813/data8/combined_assembly_edits/orig_cladonia_demux_Cladonia_subtenuis.trimmedR1.fastq.gz /scratch/summit/caan8813/cladonia_all_demuxed/orig_cladonia_demux_Cladonia_subtenuis.fq Running on 1 core Trimming 0 adapters with at most 10.0% errors in single-end mode ... cutadapt: error: Line 1 in FASTQ file is expected to start with '@', but found 'TTGACTACGA'

When I search the .fq file for the line above, it exists but it's not in line 1, which suggests something is happening to the file after it goes through step 1or 2 in iPyrad. This is a problem I've had before with something going wrong either in Cutadapt or in iPyrad. I submitted the problem to the Cutadapt folks but they think it's in iPyrad. Any help is appreciated, let me know if I can give more/other info.

isaacovercast commented 5 years ago

Did you look at the fastq file? Did you import sample fastq files that were already demultiplexed or did you have raw data that you allowed ipyrad to demultiplex (i will bet the first one).

gunzip -c /gpfs/summit/scratch/caan8813/data8/combined_assembly_edits/orig_cladonia_demux_Cladonia_subtenuis.trimmed_R1_.fastq.gz | head 16

carlyraea commented 5 years ago

They were already demultiplexed and cutadapted. I don't want iPyrad to do any demuxing/cutadapting because that's where the problems seem to start. However, when I try to skip step 1 it also fails so I'm trying to force it. Does it seem to make a difference in quality/run if you use iPyrad to do the adapter cutting and demuxing vs. doing it yourself? I prefer to do it separately so that I can check things before loading into iPyrad.

isaacovercast commented 5 years ago

You can't really skip step 1, right, but you can load in pre-demultiplexed data. Did you load in data already using the sorted_fastq_path parameter and running step 1? This is the preferred way to import data that has already been demultiplexed. If you set filter_adapters to 0, and you already filtered for adapters, then step2 will essentially leave the data untouched.

Overall I think using ipyrad to demultiplex and filter the data is probably as good if not better than doing it by yourself. We spend a lot of time tuning the parameters for the different rad datatypes, and we do this so you shouldn't have to! Plus just letting ipyrad do it avoids all these nasty complications.

carlyraea commented 5 years ago

I am using the sorted_fastq and the filter_adapters set to 0. I would like to use iPyrad for everything, but somewhere in the demux/cutadapt step within iPyrad the reads get messed up, which is why I've been doing it separately.

Even though I have those parameters set, iPyrad seems to be trying to cutadapt because you can see from the above log: "Trimming 0 adapters with at most 10.0% errors in single-end mode ... cutadapt: error: Line 1 in FASTQ file is expected to start with '@', but found 'TTGACTACGA'" And those errors that it cites do NOT exist in the previously cutadapt reads when I look at the originals before loading into iPyrad.

isaacovercast commented 5 years ago

Can you show me the first 16 lines of your fastq files?

gunzip -c /gpfs/summit/scratch/caan8813/data8/combined_assembly_edits/orig_cladonia_demux_Cladonia_subtenuis.trimmed_R1_.fastq.gz | head 16

isaacovercast commented 5 years ago

steps 1 and 2 have been rock solid for quite a while. Since you were on an old version (the version that was initially giving the error) I would really just recommend starting over at step 1 using the original raw data and letting ipyrad do the demux and filter. If you start over at step 1 with the -f flag I promise it'll work on the current version.

carlyraea commented 5 years ago

???[?}ɒ?(???{??$!Ӣ?0Y??E?®???ƝA?"ρ?ʌAY?>9?<??????f)?mVb???????_??l?6???~???۬??????l?6?????? ?߯??????>r/?kk? ??e????Ͼ????>?y?۷???????????=?~?e-?d???2-????^?௏??G? ???Y???p??\< ???v??l??;t`??7?~??7^W?O~??????o?ϼ<r׸4bJ?????:}j?_ޤ???4kB=$??Y?/????3G,@?~#}~?5) ?????p-<??ó?????ۥw?X?9nG???]?@?zZ???{?ݥ????nIbao?_???a4?h ???o?8~?~?~<<F??qeՖ????EO??'???g?6???V?(H?0]?»???'?*<??? >S΂X^?$??? 7?nL??4O?7???] W?ß????]??? ??? G??r?,=???U???E9x|/??y?i/????,f??jNt??*<????g??q?ñD????-> ?^????Sܓ ???HLw?e w?߽?/Y:9??????"?\i?W?߰?ڞO?N?Y?e?3嗯?86??"????@~?e~#{|?u??HH? ? ???N?œ?h????x"#se???%?Q)?9?8?|Vā.??}z[?-??ש??????u?U????c4????^? ??0???0v??"?w?ɸ{x????A?ϫ???֑?௭:???ͦ?6 wS/??????~/d????x~cQEA???S?????Ÿ~?U4??M?a?p?឴6?ɗ׿?z?'/Բ??? TЯ?<䧀 ???n??c6+??ܲ?5 ??;???_4????%?r˟p???g'?f6????[ ?rw/-? ???ҿD<??ۓR?ן C)??6w/Ů2?|??m???˺-???D<?%rq??<???/?lПݩ?MJ?a??_d_> ????ћ??.?|?????/?????}??4?6\? ??u????P?$?????Rj3F???a:-W??4?? ????????ú3?+?ā{???V??-?????????v?f?N???=7?x?Q?y?-g=?D??m^????w?&?U???tg-8????????u?ep?z??? ?8??????q?G????d???p??Y?x??z&߫?|Q?d???t4?Y?:Ul?9???h???}> ???z&?7]!Z???? ϩ,ߢ??}'|??r?l??4?0?;4m????4/???m?@v?_?:ל@??y^???????????a?i>??-?^???ˇ ????&]?????y?????J???8???A?3?ݸ ?Ҫ?ފۉ͚[I <?H>p??C?#?=DZOa??]*H ??4aS?Kh??@?L?????ea????h <???t*?{1"???#?G??)Ŋ]{g???? ???𮰛{????l?????3? gP 5@鼿???Dvy?ɠ'????Ϸ'???aPj?w[??&?'?^DM@??F???>?h???tCM?ǻ??$??;>?WHH?w?%|3L"m????????0??)?D?B2?Fa#??3?btW^????f????G???]?1?s????Z?[? n??~i~i!!?h?3???.?Y]u?)?(H??)?KWRRZ2?Ĥ??C?G?S????Jz??j o?=x>?:<e??)%?SP????? ???\)??u?|?S?Ar>??s?,?E-;)ɧMj?N?+?9??,???vު?٤QSw??8? %~?>???*?????]d>,?M??F펞?+??????6?? <???E??8?i???b% ? ?S???&?|Z7e?1F??|??? ?R???&kOk{??OF| G?????ӧx?XL ?s?S^˾C?՞AHЄԆ??? ?'?a%Qwr?[&jF=/g4?Fta??4Z???jR ?_'?֝T?Vr?W3չ<??e??#?6t????f ?3I=Cs??lݕ?7??KȊ??/--I???- ?#?e>R??N?¡?r5?h(?c?M???uQ???V!?Y????,!??u??G?/?c?l????9?u[媆?;??? +%?<?N??s?%??e?. ?Ԇs?<K?\?????????:???k??F??] [?b76P??????(D?z:/???,?C???zgV???O"?-??y??@?? ?V??????y?0??ڈ2e{0{fu?(?YT0??_?f??K??Y????Q-??#l??^Y??Ե?N Dr??<ɵ?f?*???l??5?6p?Z??M??????諶?O??????????Q??-?>'?;?A?iv??I?u?4?l??ת3~S?y?h????^w?E4?XC?2???d#0?????$??X,hڱ ?????Q??J??X???va?⫀??1??H?8S??*euZ״?o,? rj8??)????A??????=u&????餜?V9???j??E?Fc[?aY??]<@?t?[v6?Ju??Z?o?<?~??O!?v????F??HG,?|?t?WpˆJ?????Қmr?p?~ZJ????b&?2>?}?y??'??^|?? Uw?4???? f!?T#O???L??2???^?1G?ſ?>? ?!?]?(??A7?Iw????}??դZ??IbN??b?S}Z?Tx???m??eq?w??g?W???????g?,Tay?FZ????$x?&??-S&!?M6Qԥ?N_L?l6?ĺ?ӷHb,??.?\???R?cȀs6?$????+e0??J?Skܞ?O we???? CgkC~{5x?9?\^??Ҫ?2&W0????????Jf? ??:?$?xBppFȹ?9D?T???e?q?朄??"jK?x-h???Aթ??\\?i???ܫ?:?*ވ:?Ժ:x??]7??ͺ?d??????W? ?m^?? D?????'63Iȥ??? ?Mys?g?X:*??MH??%?ɱ?????+n?d+?&TN?Rfj?? _g?&???M&N??h???i?_?O?4Mz?ڽ?\߲Ơ 3?????J??U?S|{q?N???^???9ڝ????i?b69?1????B?f/??ԛ????ai?1?S@???+??l?䛒 ?3j? ??,7?S?қ??h<@?s?5?}X4?J?ׇB???-?D;?I?O?.?~ ???<FϮ?????LWgUˠT? ???w??{?$? ,ۈE)????UG7?˶Lk?dӆ I?w?3?b[T?r??b?1B??f5LD? ˺??t?3e?\???+-b?n? ?????A?l?5?iQۺ??????+=F6?<?c?8x?g??(?73????`L?R``CI?]Baq?f??:<g<3? ??Ds.?:w?s?%Y, ?o????AG?5JA???5p? h??????N?qd?? ?n? ???ww]?B#-?.??Yu?dq?d?֩{ݤ??_?6p?e?V?=p??-?꙰x???kؘ??qE?=o?A07???\A? ? l?????D??p? ???+:XH?#@BC8??e??????l???%?&Үk??%???χ?mn??s+??? l?U???????x?nÑ???j?s???=???F??j0?0qQ?5h??I??#?Z:-??tT??!??w2??f?;?G??>?WL?!?:;??4̛0O?.??S?]d??????|?"ܺ? FJ??t??Ks???ÎOvzf,??????? ;??????=?❭??h«?ޔ?ׄ?TJ??C?~?V??N?%h?8|??3G????l???z?P????CHwj?A$q???y\?Sz??+?j?W??f?DL?c?l0?py?™???e ?W?oi??{?H??3???????^ ????f<???(Łz?w??q?Y?A?_] w?J/C??|Q\ h??Nˀ?*?ʗ?Z_tf{?;??> ??U?g:顗v???L?"-s????5xVp?ILO?L??68-?^?޹?n???( ^?{]2id?Dů?'??4? ?KծL??ڽ[??Y֒????5?Vey?p??,?%%?EL?+d{?@??{©EL?X?#?^k???>:?p?+C?6????F?u?c۷Ï4??:N9?MޠԩZ?+qX???B?Y?z?ih???k???K2?P?Żl?F?"?????>?^???~?G??M2)q??

They're gibberish at the moment..

carlyraea commented 5 years ago

Whoops. Okay, I'll try that and close this if that seems to just work.

isaacovercast commented 4 years ago

All good? Please comment and reopen this issue if not.