Closed carlahurt closed 3 years ago
Hello. This is almost certainly a disk space issue. I'm positive that you ran out of disk space right at the beginning of the aligning step. Please be sure that you have plenty of free disk space, like 100's of GB or more, as much as you can get. ipyrad creates numerous temporary files (most of which are cleaned up after each step) but we assume that disk space is essentially unlimited. If your assembly is especially large then this is even more true. Unfortunately there isn't a way to restart within a given step, so you'll have to run step 3 again. The clustering step can be impacted by very long reads, paired end data, and also very noisy data. I would verify with fastqc that your data does not contain a significant amount of low quality bases, and if it does I would trim the reads (during step 2) to remove as much of this as possible. This will speed up the clustering step.
Hey, Isaac, Dr. Hurt’s HPC admin here. Where are the temporary files stored by default: /tmp or elsewhere? If /tmp, is there a way to override that, via environment variable or parameter?
Hello Mike, all the temporary files are created within the project_dir
which is specified within the params file for a given assembly. We don't touch the filesystem outside of this directory.
In that case, we should have some dozen or so TB free there. Can look more closely on Tuesday or later.
If it's not disk it could also be some other resource if there are quotas, for example I have seen issues with quotas on max number of files which could cause a similar kind of behavior.
Far as I know, we have quota reporting, but no limits on space, file count, inodes, etc. Can verify later and let you know, thanks.
Hi Mike,
Thank you for trying to figure this out. Let me know if I need to delete some files.
Carla
Carla Hurt Associate Professor of Biology Tennessee Tech University (931) 372-3143 https://sites.tntech.edu/hurtlab
From: Mike Renfro @.> Sent: Sunday, July 4, 2021 4:45:02 PM To: dereneaton/ipyrad @.> Cc: Hurt, Carla @.>; Author @.> Subject: Re: [dereneaton/ipyrad] Crash at the end of step 3 (#448)
External Email Warning
This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
Far as I know, we have quota reporting, but no limits on space, file count, inodes, etc. Can verify later and let you know, thanks.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/dereneaton/ipyrad/issues/448#issuecomment-873668260, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHOP65CRHPFCFXNVZFY6ALDTWDI55ANCNFSM47YCGL7Q.
From the file server's quota reports:
User quota on /mnt/xfs1 (/dev/mapper/VGMD1-LVMD1)
Blocks Inodes
User ID Used Soft Hard Warn/Grace Used Soft Hard Warn/Grace
---------- --------------------------------- ---------------------------------
...
churt 2.8T 0 0 00 [------] 307.4k 0 0 00 [------]
...
So there shouldn't be any file size or file count limits in place.
Some strategic googling leads me to believe that the error messages we are seeing are SLURM red-herrings: https://github.com/E3SM-Project/E3SM/issues/3138 https://www.mail-archive.com/slurm-users@lists.schedmd.com/msg04725.html https://www.lstsrv.ncep.noaa.gov/pipermail/ncep.list.fv3-announce/2020-September/000410.html
Looking back at the output you originally sent it seems possible that these messages are internal SLURM noise and that they ipyrad assembly actually completed successfully.
Is it possible that step 6 actually ran to completion? Did you try running step 7?
You're right - it worked!!! Thank you for your help - I apologize for the unnecessary drama. I assumed that with the two pages of error messages it couldn't have worked.
Very good. This is my favorite kind of problem.... the one that solves itself ;)
Hello, I am running a denovo alignment on a very large salamander GBS dataset. Step 3 took a very long time (12 days), and then something appeared to have happened during the aligning step. This caused it to crash in step 6. Any ideas on what went wrong? Also, is it possible to pick this back up after "chunking clusters" in step 3 and skip the long wait? I posted the errors in 2 snips - the messages were too long to fit in a single screen shot. I'm also attaching my params file for reference.
params-barb1.txt
Thank you, Carlaa