cytomining / profiling-handbook

Image-based Profiling Handbook
https://cytomining.github.io/profiling-handbook/
Creative Commons Zero v1.0 Universal
8 stars 7 forks source link

Use temp dir flag when making CSVs #7

Closed bethac07 closed 2 years ago

shntnu commented 5 years ago

Do you recollect details of this issue, @bethac07?

I thought it meant that we use use the -t flag when calling create_csv_from_xml.sh but that's probably not it

bethac07 commented 5 years ago

I do believe that's it, yes.

shntnu commented 5 years ago

Hm – the flag has been defined but is not used https://github.com/broadinstitute/cellpainting_scripts/blob/master/create_csv_from_xml.sh#L30

It's possible that what we meant here was that create_csv_from_xml.sh is slow on EFS, so we should use a temp dir (and thus specify the temp flag) to copy the batch files locally on EBS, and then create load data. But I haven't had issues recently with create_csv_from_xml.sh.

So unless there's something else we need to address here, we could close this.

bethac07 commented 5 years ago

The major issue I've run into with it is that it's very slow when working in parallel (like, runs in 5-10 minutes not run with parallel, takes many hours if you're doing more than a couple with parallel); I do not recall if there was once a reason we suspected it was EBS/EFS related (vs, say, network I/O), and my Slack investigations are not turning up anything.

shntnu commented 5 years ago

I lost the reply I was writing here, but the essence is that we'd need to address this https://github.com/broadinstitute/pe2loaddata/issues/11 to fix this issue. I'll keep this open.