Open tjgalvin opened 1 year ago
Thanks for documenting this @tjgalvin. I've got an open PR (perhaps prematurely) on RM-extract. Lets use my fork and combine our changes there: https://github.com/AlecThomson/RMextract
Then I'll open a issue on the main fork and see if the maintainers are interested
I've opened up a race branch that tries to capture the file exists checks with some slight cleaning up. I think this should be merged after a review.
We will still need to add some exposed option to provide a formatter lookup.
When running the
FRion
correction stage of the arrakis pipeline a number of issues internal toRMextract
were found. This issue aims to record them for future generations.The compute nodes of petrichor do not have external network access. Internally,
RMextract
attempts to download data files from remote sites over ftp. This fails. @AlecThomson found that by supplying dummpyproxy
settings through toRMextract
it could be convinced to use a caches copy of these files, so long as theserver
url is of the formfile:///path/to/ionex/files
.In this approach a secondary issue was found where the data file is copied into place and extracted using a subprocess call to
gunzip
if the file ends in.Z
. Although this works when in a single-process environment, the concurrently running tasks would each try to download (i.e. copy from local cache) and extract the same file. This results in race-conditions where the file is constantly overwritten, and eithergunzip
fails outright or the extracted file is not complete. I added a collection ofos.path.exists
type checks and early escapes, which seems to get me by.We should merge these fixes together and have a static fork?