USEPA / esupy

A library supporting Python-based tools in USEPA's tool ecosystem
5 stars 2 forks source link

Modify download data fxn to also download meta/log files #13

Closed catherinebirney closed 3 years ago

catherinebirney commented 3 years ago
bl-young commented 3 years ago

This could be due to my change in the parsing function, but now the git hash is getting lost from the df for the _metadata and _validation files:

  name version git_hash ext date file_name
11 Employment_state_2013 0.2 bbb56be parquet 2021-08-12 20:03:32+00:00 Employment_state_2013_v0.2_bbb56be.parquet
12 Employment_state_2013 0.2   json 2021-08-12 20:03:32+00:00 Employment_state_2013_v0.2_bbb56be_metadata.json
  name version git_hash ext date file_name
5 Employment_national_2017 0.2 135fd2b log 2021-08-17 17:00:39+00:00 Employment_national_2017_v0.2_135fd2b.log
8 Employment_national_2017 0.2   log 2021-08-17 17:00:39+00:00 Employment_national_2017_v0.2_135fd2b_validation.log

Edit: However based on how the files are pulled this does not have any consequences and they still get pulled correctly.

bl-young commented 3 years ago

@catherinebirney confirmed successful downloading in StEWI despite the lack of githash, though I still pull them separately which probably helps.

catherinebirney commented 3 years ago

hmm okay, I will look into why the git hash gets dropped. Your code works to pull two files by running download_from_remote() twice? Even though I changed the functions to return a list of files?

bl-young commented 3 years ago

@catherinebirney I made some updates as we discussed. This will use the extension to only grab the most recent of the given extension, but still returns all file types based on the v/h of that most recent parquet. Give it a try to make sure it works for you. I was able to remove the double download in Stewi with this structure.

Once you test/confirm I'm good to pull this in. I think we should update the release for this.

catherinebirney commented 3 years ago

Excellent - the code works in flowsa as well. Agreed that we should update the release