bcgsc / straglr

Tandem repeat expansion detection or genotyping from long-read alignments
Other
50 stars 9 forks source link

Could I set the work directory for straglr? #4

Closed ttbond closed 1 year ago

ttbond commented 2 years ago

Hi~ I ran straglr on our cluster but some tasks failed. I found that the default work directory of straglr is /tmp (many tmp files under this directory were found when running straglr), but the storage of /tmp of each node of our cluster is only about 50GB which leaded to the crash of straglr when the storage of /tmp was full. So, could I set the working directory?

Best wishes, ttbond

readmanchiu commented 2 years ago

Thanks for asking. Being able to set the working directory where the temporary files will reside is probably a good feature to have. Currently there isn't such an option. I can quickly add the feature and update here when it's implemented. It will probably take one or two working days.

In the meantime, maybe you can try setting the TEMP directory to another location which has more storage, like

export TEMP=/your/path

that may be a bandaid solution for now

ttbond commented 2 years ago

Thanks for your kindly reply. I tried to set $TEMP, but it didn't work. Looking forward to your updating! The commands I used:


            PATH=/data/home/xutun/miniconda3/envs/straglr/bin:$PATH
            export TEMP=/data/home/xutun/dotplotSv/trfTest/straglr/straglrRun
            straglr.py /data/home/xutun/dotplotSv/trfTest/straglr/aln/HG00733.merged.mm.sorted.bam /data/home/xutun/ref/GRCh38_NCBI/gr38.fa /data/home/xutun/dotplotSv/trfTest/straglr/rel/HG00733.merged.batch0088 --min_support 2 --loci /data/home/xutun/dotplotSv/trfTest/straglr/tmpWorkD/batch0088.bed  --max_str_len 100 --max_num_clusters 2 --nprocs 12
readmanchiu commented 2 years ago

I have added the option --tmpdir to specify the tmp dir, I guess different Linux flavors may use a different name for the tmp dir variable. Hopefully Python will handle it properly. Please clone the repo again and give it a try. Also please note that the tmp directory (with or without your customized location) are used for generating the many temporary fasta and bed files (and they will be automatically removed unless --debug is specified), the current working directory where the script is run will temporarily hold the outputs of tandem repeat finder (TRF) so you might want to make sure the current working directory is also not too limited in space (these will also be removed when the run is finished).