cgroza / GraffiTE

GraffiTE is a pipeline that finds polymorphic transposable elements in genome assemblies and/or long reads, and genotypes the discovered polymorphisms in read sets using genome-graphs.
Other
121 stars 6 forks source link

Question about './work directory storage issue' #47

Open xxYaaoo opened 2 weeks ago

xxYaaoo commented 2 weeks ago

Hello,

I am wondering if it is feasible to delete some folders in ./work directory of some completed steps according to the output.log in order to alleviate storage burden? For example, the pipeline is running, but I delete the 'cd/d3d2ce' dir.

image

Besides, I also encounter the situation that when I use 105 samples to test the output difference between 'GT-sv-GA' and 'GT-svsn-GA' simultaneously, the storage of ./work could reach over 10T. Is this a normal phenomenon?

Thank you for the development and maintenance of GraffiTE !!

clemgoub commented 2 weeks ago

Hello @xxYaaoo,

Thank you for your feedback, this is indeed something we should look into. Can you tell us if there is any process that uses most of the storage? If that's the graph alignments, perhaps we can delete the alignments files once the VCF for the sample is made.

For the time being, you should be safe to delete a ./work directory once make_graph is complete. At this point, you will be able to use the VCF produced by the tsd_ processes, in the 3_TSD_Search output directory.

Cheers,

Clém