Closed carlahurt closed 3 years ago
Hello. Whoops, this is a very small error in the string comparison. I fixed this (333779d) and pushed a new version of ipyrad (0.9.78) which should be up on bioconda within the next 24 hours. Give it a try and let me know how it goes.
Thank you for your help with this. I added the fix from your post and made it a bit further in step 3 and hit another python error:
Something failed during step 3, the mapping reads step should take longer than 21 seconds (typically). The error message indicates that one or more of the bam files failed to index. I'm guessing what happened is you ran out of disk space at some point and things started to fail silently. Can you verify that you have sufficient disk space for performing the assembly? Typically this will require hundreds of GB of free space, but again, the amount of required space will be dependent on the amount of raw data you have.
Hi Isaac,
I have 17 TB of free space available to my home directory, no quotas are in place. Would any of the files be written in another location if I am running the script from inside my home directory? The HPC administrator has not seen any of the compute node /tmp disks filling up.
I tried to run this again after your update.
It looked like it finished step 3, but when I tried to proceed to step 4 it failed because none of the files were ready:
After updating the conda environment we are receiving a warning that the latest version of HDF5 (1.10.7) is incompatible with the version that ipyrad was built on (1.10.6). We can get around this by setting "HDF5_DISABLE_VERSION_CHECK=1". We don't think this is related to the problem.
Thank you for your help!
Hello, well the current version of ipyrad is 0.9.78, and from the screenshot it looks like you're still not on the most recent version, so if you can update and try again that would be great. It might be best to create a new conda environment and install fresh inside this new environment, I bet this will solve the hdf5 error as well. Good luck.
Hi Isaac,
We updated the Conda environment and ipyrad to 0.9.78. We are still receiving the following error message for step 3 (after the mapping reads step that appears to be related to an index for bamfiles:
'''
Parallel connection closed.
ValueError
Traceback (most recent call last)
After consulting with our HPC expert, this was not likely due to disc limitations. Do you have any other suggestions to complete step 3?
Thanks again, Carla
Hi Carla,
It's possible that some files in the _refmapping directory from a previous run haven't been cleaned up. You might try removing the full _refmapping directory from the project_dir, and then try running step 3 again with the -f
flag, to force overwriting. Let me know how it goes.
-isaac
PS - When posting it's really great and helpful when you include screenshots of the complete output of the ipyrad run and also the full error message.
Hi Isaac, I deleted the old files and renamed the output folder so that overwriting wasn't an issue. I also included the -f flag
I am still encountering an error related to an index. Please let me know if there is any additional information that might be helpful.
Something isn't right, the mapping step is going way too fast. Can you post the full results of an ls -ltr
in the *_refmapping
and _clust directories? I know they will be quite large because you have a lot of samples, but I need to see what's going on.
Certainly - Attached are the results of these two folders. Thank you so much for taking the time to look this over. ls_ltr_tmp.txt ls_ltr_refmapping.txt ls_ltr_clust.txt
In the *_refmapping
directory you can see these files:
-rw------- 1 churt domain users 2477521779 Jun 6 08:27 B6_E005Y2.sam
-rw------- 1 churt domain users 219666275 Jun 6 08:28 B6_E005Y2-unmapped.bam
-rw------- 1 churt domain users 708752024 Jun 6 08:29 B6_E005Y2-mapped-sorted.bam
-rw------- 1 churt domain users 728515803 Jun 6 08:29 B6_E005Y2-unmapped.fastq
but there should be a *-mapped-sorted.bam.bai
file which should be generated by a samtools index
command. It's possible that samtools is not installed or not installed correctly on your system (even though it should come down as a dependency of ipyrad). Can you log in to the computer you're running ipyrad on, change directory to the *_refmapping
directory and run this command:
samtools index B6_E005Y2-mapped-sorted.bam
Please make sure you are in the same conda environment as you are when you run ipyrad. Let me know what happens, and if there are any error messages please post them here.
Hello,
Here is the screenshot from the refmapping directory:
Please let me know if you need more information. Thanks for your help!
I see the problem. Samtools index by default can't handle ref seqs with
large chrom size. There's a flag to pass the index command to allow this,
so i updated the code to use the samtools index -c
option by default. I
pushed a new version which should be up on bioconda within a day or so
(v0.9.80). Once it is up there please install it and try again (you only
should have to run step 3 again).
On Wed, Jun 9, 2021 at 3:03 PM carlahurt @.***> wrote:
Hello,
Here is the screenshot from the refmapping directory: [image: image] https://user-images.githubusercontent.com/31260532/121435319-4525db00-c944-11eb-8219-98bfb067ba64.png
Please let me know if you need more information. Thanks for your help!
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/dereneaton/ipyrad/issues/444#issuecomment-858131711, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNSXP53Y26O7TUYIUQYOBLTR7QJTANCNFSM45WOC4EQ .
Hello, I am on step 3 and I'm using a related species as a reference genome. This genome is a beast (32 Gb)! The program seems to recognize that we are dealing with large chromosomes. There are a couple of issues. I see where it is recommending the -c flag but there is also a python error dealing with 'str' that I'm not sure how to fix. Thank you in advance for your help!