Closed catherinebirney closed 3 years ago
This could be due to my change in the parsing function, but now the git hash is getting lost from the df for the _metadata and _validation files:
name | version | git_hash | ext | date | file_name | |
---|---|---|---|---|---|---|
11 | Employment_state_2013 | 0.2 | bbb56be | parquet | 2021-08-12 20:03:32+00:00 | Employment_state_2013_v0.2_bbb56be.parquet |
12 | Employment_state_2013 | 0.2 | json | 2021-08-12 20:03:32+00:00 | Employment_state_2013_v0.2_bbb56be_metadata.json |
name | version | git_hash | ext | date | file_name | |
---|---|---|---|---|---|---|
5 | Employment_national_2017 | 0.2 | 135fd2b | log | 2021-08-17 17:00:39+00:00 | Employment_national_2017_v0.2_135fd2b.log |
8 | Employment_national_2017 | 0.2 | log | 2021-08-17 17:00:39+00:00 | Employment_national_2017_v0.2_135fd2b_validation.log |
Edit: However based on how the files are pulled this does not have any consequences and they still get pulled correctly.
@catherinebirney confirmed successful downloading in StEWI despite the lack of githash, though I still pull them separately which probably helps.
hmm okay, I will look into why the git hash gets dropped. Your code works to pull two files by running download_from_remote() twice? Even though I changed the functions to return a list of files?
@catherinebirney I made some updates as we discussed. This will use the extension to only grab the most recent of the given extension, but still returns all file types based on the v/h of that most recent parquet. Give it a try to make sure it works for you. I was able to remove the double download in Stewi with this structure.
Once you test/confirm I'm good to pull this in. I think we should update the release for this.
Excellent - the code works in flowsa as well. Agreed that we should update the release
Update get_most_recent_from_index() to return a list of file names rather than singular nume ---- list of file names have the same version and git hash as the most recent file in the list ---- any file with the identified version/hash is downloaded ---- allows for multiple log files/meta/any possible future files uploaded
Potential issue with data that do not have version/hash