Closed majazaloznik closed 6 years ago
thanks so much for your thoughtful edits. i need to rewrite so the catalog is read-only..
Sure thing! I still think my first suggestion (the line I wrote in #126) is the better option--only unzip the named file and ignore the other one that is duplicate anyway, also keeps the catalog intact. A lot cleaner and less kludgy.
does that behave differently than
https://github.com/ajdamico/lodown/commit/bca20c3087fe3bdd3ea96c8f0f0281eb81365aee
? thanks!
On Thu, Dec 21, 2017 at 9:11 AM, maja notifications@github.com wrote:
Sure thing! I still think my first suggestion (the line I wrote in #126 https://github.com/ajdamico/lodown/issues/126) is the better option--only unzip the named file and ignore the other one that is duplicate anyway, also keeps the catalog intact. A lot cleaner and less kludgy.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ajdamico/lodown/pull/127#issuecomment-353360179, or mute the thread https://github.com/notifications/unsubscribe-auth/AANO56umFEZnukFgR3bkPgUDinfNAELCks5tCmcJgaJpZM4RJO1C .
yes, (i tested both as well) edit: they weren't clean tests I just realised (i was fiddling with the catalog) but did a clean one now and still no joy.
hi, the catalog insertion is the problem -- i'd rather remove the output_filename
column and just store everything into the output_directory
that's available, and not insert an additional row into the catalog. i'll look closer at this next year
yeah - but think about it this way: why do you need a catalog insertion? because they added an extra file that isn't in the catalog - but that the user can download separately and merge with this one if they want. so i say just ignore the extra file. keep the integrity of the catalog, all the data is there anyway?
if you think all multiple-file downloads from dhs are mistakes, i agree you can just add a hardcoded removal! thanks
i am sure they would find the word mistake a bit strong, but it is still inexplicable to me why they did it. the files that i checked perfectly duplicate the data already available in other files. (i can't check them all right now, i'm working off mobile data :)
i've made a bunch of changes so closing but can reopen if i'm being stupid. thanks
thanks, i'll have a look asap. hope you had a good break ;)
On 10 January 2018 at 22:18, Anthony Damico notifications@github.com wrote:
i've made a bunch of changes so closing but can reopen if i'm being stupid. thanks
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ajdamico/lodown/pull/127#issuecomment-356739784, or mute the thread https://github.com/notifications/unsubscribe-auth/AHaQrSD7zh-HD5rk88BeajUB_T2uDDj3ks5tJSktgaJpZM4RJO1C .
OK, here it is. This extracts all .zip files if sub zipfiles exist. It also adds a row to the catalog, and even gives a semi-correct message at the end of each file extraction - the only issue there is that you cannot know in advance if you have one of these files in your catalog, so it is impossible to always correctly know how many entries there will be in total until the end.
Other than that it's great. :) Mind you, this still won't work on Windows. I think that could be solved actually just by extracting into a different folder than the download folder. But I can't test it now, but can check back in a few weeks when i get back to my work computer.
And thanks for this opportunity, I learned a tonne! (sorry about the spacing, I have no idea what is going on there!)