cosmir / openmic-annotator

Annotation framework for annotating data for OpenMIC
MIT License
56 stars 1 forks source link

Ejh 20161229 iss37 audio upload #40

Closed ejhumphrey closed 7 years ago

ejhumphrey commented 7 years ago

First pass at an audio uploading command line tool.

Next steps include user authentication (whitelisting / admin permissions) and a better grasp on the kinds of metadata we're getting from Jamendo.


This change is Reviewable

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 89.51% when pulling b202844b207a6d367a5658f87717510569d1fab6 on ejh_20161229_iss37_audio_upload into 546bd468861ab8310e5a678c5e9120bf9da9ce0a on master.

ejhumphrey commented 7 years ago

@ffont + @alastair, I poked y'all as stakeholders in the uploader CLI ... this is equal parts RFC and PR, curious to hear what you think and if / how this will serve our collective needs.

couple things to note:

ffont commented 7 years ago

Reviewed 5 of 5 files at r1. Review status: all files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

ffont commented 7 years ago

I'm not sure if I managed to do the review properly with this Reviewable thing. Why don't we simply use Github's code review tools?

Anyway, it all looks good to me so far. The only real concern I have is that if we are to upload large amounts of files it will most certainly happen that we will need to pause/resume the process at some point. Therefore it would be great to be be able to do this somehow. One way is to ask the server if a file has been already uploaded (before uploading), but this requires an request per resource. Another way is to parse the generated logs to figure out which resources have already been successfully uploaded. For what I understand a different log file will be created at each run of the tool (https://github.com/cosmir/open-mic/blob/b202844b207a6d367a5658f87717510569d1fab6/scripts/audio_uploader.py#L91), so we would need to parse all of them (still much faster than the extra request per resource).

Also, why do we log everything at the end of execution? If we are to rely on logs we should log at each job (or in small chunks of N jobs) so we don't lose information if something goes wrong.

What do you think?

ejhumphrey commented 7 years ago

thanks! I'll try to break my reply out into its constituent parts.

okay, lot there 😄 ... thoughts?

bmcfee commented 7 years ago

I have no problem dropping reviewable for this project. I agree that its UI is terrible, but I still like it more than GH for its threading and partial status marking.

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 90.909% when pulling 6af3fe280b7b5cd53f0b7c4d375ee829e3997381 on ejh_20161229_iss37_audio_upload into 531ef0658c9b0f5cddca863c9616cdd04edda3df on master.

ejhumphrey commented 7 years ago

opened issue #41 about gh-review, brought this up to date with master (no conflicts). I'll take a look at failsafe logging after lunch.


Comments from Reviewable

ejhumphrey commented 7 years ago

update: my logging suggestion isn't going to work; I think we'll need to use the built-in logging functionality, writing out serialized JSON objects as text. ¯\_(ツ)_/¯

ejhumphrey commented 7 years ago

and in case anyone is following along at home, my threading fears were unfounded: https://docs.python.org/3/library/logging.html#thread-safety

ejhumphrey commented 7 years ago

@ffont I think I've mostly taken care of logging and pausing / resuming on d72f813 -- it's a two step process, but I think it gets the job done. Needs some testing, but I'm not sure how to best do that just yet.

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 90.909% when pulling d72f813abeb37378731e5c3a52c7845ddd25bccb on ejh_20161229_iss37_audio_upload into 531ef0658c9b0f5cddca863c9616cdd04edda3df on master.

ffont commented 7 years ago

Reviewed 1 of 5 files at r1, 3 of 3 files at r2, 2 of 2 files at r3. Review status: all files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

ffont commented 7 years ago

Sorry for the mess with reviewable and github reviews :( I really don't know how to make the pending reviewable check pass...

Looks good. The idea is to use both commands chained right? Something like:

python audio_uploader.py path/to/filelist.json http://example.com/upload/ --log-file out.log

--> pause

python filter_successful_uploads.py path/to/filelist.json out.log  remaining_filelist.json
python audio_uploader.py path/to/remaining_filelist.json http://example.com/upload/ --log-file out.log

--> pause

python filter_successful_uploads.py path/to/filelist.json out.log  remaining_filelist.json
python audio_uploader.py path/to/remaining_filelist.json http://example.com/upload/ --log-file out.log

...

Maybe we could simply call filter_successful_uploads.py functions (parse_log and filter_successes) from audio_upload to easy the workflow?.

ejhumphrey commented 7 years ago

stupid reviewable ... shipping this one through, will clean up any mess it causes