dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
72 stars 40 forks source link

step3: cannot copy sequence with size 142 to array axis with dimension 0 when build clusters with clustmap.py #404

Closed Liut035 closed 4 years ago

Liut035 commented 4 years ago

Hi,

I've been troubling with a problem for several days. I would appreciate it if you could give me any advice.

I ran ipyrad v.0.9.52 with 164 individuals in total (~60GB) on campus research computing resources with 40 cores and 16gb memory-per-cpu. Pairgbs sequencing with demultiplexed fastq files and reference genome were used in my study.

The error message is: Step 3: Clustering/Mapping reads within samples [####################] 100% 0:00:03 | indexing reference
[####################] 100% 0:35:02 | join unmerged pairs
[####################] 100% 0:10:10 | dereplicating
[####################] 100% 0:03:56 | splitting dereps
[####################] 100% 1 day, 13:36:51 | mapping reads
[####################] 100% 0:09:10 | building clusters

Encountered an Error. Message: ValueError: cannot copy sequence with size 142 to array axis with dimension 0 Parallel connection closed. ---------------------------------------------------------------------------ValueError Traceback (most recent call last) in /apps/ipyrad/0.9.52/lib/python3.7/site-packages/ipyrad/assemble/clustmap.py in build_clusters_from_cigars(data, sample) 2180 seq = cigared(r1.seq, r1.cigar) 2181 start = r1.reference_start - reg[1] -> 2182 arr1[start:start + len(seq)] = list(seq) 2183 2184 seq = cigared(r2.seq, r2.cigar) ValueError: cannot copy sequence with size 142 to array axis with dimension 0

The second step completed successfully, I don't think it is the reason for the error in third step. Anyhow, I attached part of "s2_rawedit_stats.txt" here. reads_raw trim_adapter_bp_read1 trim_adapter_bp_read2 trim_quality_bp_read1 trim_quality_bp_read2 reads_filtered_by_Ns reads_filtered_by_minlen reads_passed_filter 2197908 441184 46955 944289 971583 0 0 2195933 2717852 555768 56457 1181264 1185683 0 0 2715321 2122250 441998 41471 934200 969245 0 0 2120437 1269833 241032 26781 515238 543954 0 0 1268697 961878 167608 19888 374683 393510 0 0 960959 2967810 534852 61622 1170237 1259612 0 0 2964752

And in the params file, I only changed a few parameters, including: [sorted_fastq_path]: absolute directory/*.fq.gz

[reference_sequence]: absolute directory to the genome reference

[trim_reads]: 13, 0, 8, 0

I've been troubling with this problem whenever I try small datasets or large datasets, but I can't find the reason. Could you please give me some suggestions?

Thank you very much!

Tong

isaacovercast commented 4 years ago

There was an off-by-one error for PE reference assemblies. This is fixed in v0.9.53 (ef5a74c), should be up on bioconda within a couple hours.

Liut035 commented 4 years ago

Hi,

Thanks for the reply. I reran the ipyrad with same dataset and same parameter, and encountered another problem, here is the error message:


ipyrad [v.0.9.52] Interactive assembly and analysis of RAD-seq data


Parallel connection | c8a-s21.ufhpc: 28 cores

Step 3: Clustering/Mapping reads within samples [####################] 100% 0:00:03 | indexing reference
[####################] 100% 0:45:50 | join unmerged pairs
[####################] 100% 0:13:42 | dereplicating
[####################] 100% 0:05:05 | splitting dereps
[####################] 100% 2 days, 3:29:00 | mapping reads

Encountered an Error. Message: IPyradError: b'[E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes\nsamtools sort: truncated file. Aborting\n' Parallel connection closed. ---------------------------------------------------------------------------IPyradError Traceback (most recent call last) in /apps/ipyrad/0.9.52/lib/python3.7/site-packages/ipyrad/assemble/clustmap.py in mapping_reads(data, sample, nthreads, altref) 1970 error3 = proc3.communicate()[0] 1971 if proc3.returncode: -> 1972 raise IPyradError(error3) 1973 proc2.stdout.close() 1974 IPyradError: b'[E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes\nsamtools sort: truncated file. Aborting\n'

Is it also an off-by-one error? Will it be fixed in v.0.9.53?

Thanks! Tong

isaacovercast commented 4 years ago

Well it says truncated file, so I'm going to guess that you ran out of disk space. Make sure you have lots and lots of free disk space. Step 3 generates a ton of intermediate files that you need room for.

Liut035 commented 4 years ago

Do you think how much space I may need? My raw data is 60Gb, and after the second step I got a 50 Gb edits file, in the third step, the files seem can increase to more than 600 Gb.

Thanks

isaacovercast commented 4 years ago

It's hard to say exactly. 100x the size of the raw data should be a comfortable upper bound. 10x the size of the raw data is probably not enough.

On Tue, May 19, 2020 at 2:34 PM Liut035 notifications@github.com wrote:

Do you think how much space I may need? My raw data is 60Gb, and after the second step I got a 50 Gb edits file, in the third step, the files seem can increase to more than 600 Gb.

Thanks

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/dereneaton/ipyrad/issues/404#issuecomment-630787726, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNSXP6P45KRMNBXX2IIW2DRSJ4DRANCNFSM4NC5MUQA .

Liut035 commented 4 years ago

Got it. Thank you very much!