ccb-hms / nhanes-database

5 stars 5 forks source link

Metadata.DownloadErrors is empty #91

Closed rgentlem closed 11 months ago

rgentlem commented 1 year ago

Hi, It seems that the Metadata.DownloadErrors table is empty. It will be important to have this populated appropriately so we can figure out what is in the DB and what is not. Also so we can fix the download errors

sam-pullman commented 1 year ago

This is a good thing! DownloadErrors is empty because none of the tables failed to download.

rgentlem commented 1 year ago

thanks Sam...I guess I am not sure how that is determined. Is there a master list of the tables that you intended to download? I can find tables that are listed here: https://wwwn.cdc.gov/Nchs/Nhanes/search/DataPage.aspx which I think is as close to a master list as exists (although I am talking with Rafael about this). Mostly here, I think that the issue is to identify such a master list and to decide, together with Rafael, what is going to be included in a release. One challenge is that new data are released (regularly?) but it would be good to know what was supposed to come, and then figure out what didn't download as data and what didn't download as documentation. Nathan and Deepayan are, I think, going over some of the results here: https://github.com/ccb-hms/nhanes-exploration which outline places where there are some differences. I am not suggesting you do anything - I think this needs a longer discussion, looking at the output Deepayan has generated and seeing what are the best options for us when building the DB....

sam-pullman commented 1 year ago

The DownloadErrors table is where we can see tables that have failed somewhere in the download process. That process begins by creating a master list (from the very same page you linked above, DataPage.aspx) then iterating through and attempting to download all the files listed within it. It is not being checked against anything per se, because by default we attempt to collect all the files (except for those in the ExcludedTables manifest), and if any of them error out they'll be listed in DownloadErrors. Hopefully that makes sense, I'll be on the lookout for new tickets that arise from Deepayan and Nathan's exploration.

sam-pullman commented 11 months ago

A solution for this is to take the datapage.aspx file and turn it into a manifest table in the metadata. I'll do this

nathan-palmer commented 11 months ago

See #121 and #122.