IBM / mi-prometheus

Enabling reproducible Machine Learning research
http://mi-prometheus.rtfd.io/
Apache License 2.0
42 stars 18 forks source link

COG Dataset #78

Closed sesevgen closed 5 years ago

sesevgen commented 5 years ago

This is a basic implementation of the COG dataset, and the VideoTextToClassProblem class.

The user can pick either of the two pregenerated datasets, or generate a new one. Currently, the user needs to generate the dataset themselves using COG's provided tools, but this can probably by implemented through the .yaml interface later.

The user can also filter by tasks. Options are 'class', 'reg' or a custom list such as ['Task1', 'Task2'...]. Unselected tasks are not loaded.

COG stores data as .jsons and images are generated on the fly. Currently, initialization loads the entire json dataset (it's not very large), but image generation is done when getitem is called.

An issue is that it currently derives from the newly implemented VideoTextToClassProblem, however COG has tasks that are not classification. Perhaps it could fit in seq2seq, but I would like advice on how to place it better.

The cog_utils subfolder has Google/Cog code with very minor modifications. Is there any copyright issue?

tkornuta-ibm commented 5 years ago

This pull request introduces 28 alerts when merging 069554844898f5c887306093c0b1fcd53e074b44 into d8c20f1e2a50e21c4a80ce8fecdb7e480395b4d3 - view on LGTM.com

new alerts:


Comment posted by LGTM.com

sesevgen commented 5 years ago

Alerts above (except for an exception handling issue) are produced by Google's COG code, which is included in cog_utils.

tkornuta-ibm commented 5 years ago

This pull request introduces 27 alerts when merging c62e3f78ceb3bd4160607a2ca1487d0d2495f5f1 into d8c20f1e2a50e21c4a80ce8fecdb7e480395b4d3 - view on LGTM.com

new alerts:


Comment posted by LGTM.com

tkornuta-ibm commented 5 years ago

This pull request introduces 27 alerts when merging 3b6f690c58973ed081b3dc2be3d3ba5af240b72e into d8c20f1e2a50e21c4a80ce8fecdb7e480395b4d3 - view on LGTM.com

new alerts:


Comment posted by LGTM.com

sesevgen commented 5 years ago

Added a test timer for loading pregenerated images versus generating them on-the-fly on @tkornut 's request.

Loading is a bit faster, but extremely memory intensive for a naive save-load scheme. (Appx. 100x more memory needed).

tkornuta-ibm commented 5 years ago

This pull request introduces 27 alerts when merging c056bcb1ed679c7111773a67408251cafbbf311a into d8c20f1e2a50e21c4a80ce8fecdb7e480395b4d3 - view on LGTM.com

new alerts:


Comment posted by LGTM.com

tkornuta-ibm commented 5 years ago

Hey, I just want to clarify: I am not the one posting those messages "This pull request introduces 27 alerts"... seems that LGTM somehow "impersonated my account" :]

tkornuta-ibm commented 5 years ago

Disregarding that, of course it would be great if you could fix the indicated alerts before merging into develop:

https://lgtm.com/projects/g/IBM/mi-prometheus/rev/pr-bd7196a0cc97a6569052aa506a7f8d0e5d382b65

sesevgen commented 5 years ago

I think I am used to a different philosophy when it comes to external code. @tkornut I tried to keep the original COG code as intact as possible. Currently, there is no actual dependence of TensorFlow. I just included the COG code in its entirety as provided by Google with as little modifications as possible. Most of it is not used by prometheus, including anything that imports or calls TF.

We can discuss in person today, but sounds like you'd prefer for me to edit the source more heavily and only keep what is necessary?

I will address other comments.

tkornuta-ibm commented 5 years ago

This pull request introduces 14 alerts when merging b6f6bc6470e1fde26d944d2053d6da6604f8424d into d8c20f1e2a50e21c4a80ce8fecdb7e480395b4d3 - view on LGTM.com

new alerts:


Comment posted by LGTM.com

sesevgen commented 5 years ago

I addressed comments from this morning.

I will bundle data generation with the problem_initializer helper class.

tkornuta-ibm commented 5 years ago

This pull request introduces 9 alerts when merging 4c300db899e5a97bd6b3dbe031defc8dfc92ef99 into 2dfafe6087743e0de0f169971ab89d218f9a9cec - view on LGTM.com

new alerts:


Comment posted by LGTM.com

sesevgen commented 5 years ago

@vmarois Thanks for comments! Regarding non-used classes. I haven't committed to a full purge as I will be introducing COG dataset generation with the problem initialize helper, which will use some classes currently unused. We can do a final cleanup then.

I will look into substituting PIL instead of CV2.

tkornuta-ibm commented 5 years ago

This pull request introduces 9 alerts when merging 81647a020fb79c5f9a544052561e9ea1c3d986b5 into 2dfafe6087743e0de0f169971ab89d218f9a9cec - view on LGTM.com

new alerts:


Comment posted by LGTM.com

sesevgen commented 5 years ago

Done! Addressed @vmarois 's comments and ported from OpenCV to PIL.

tkornuta-ibm commented 5 years ago

This pull request introduces 14 alerts when merging be046cebdfa591b8d0ce81d1e8f511850a7738d0 into 2dfafe6087743e0de0f169971ab89d218f9a9cec - view on LGTM.com

new alerts:


Comment posted by LGTM.com

tkornuta-ibm commented 5 years ago

This pull request introduces 14 alerts when merging 58a0025a60586af448d3abfa3e6283943068125f into 2dfafe6087743e0de0f169971ab89d218f9a9cec - view on LGTM.com

new alerts:


Comment posted by LGTM.com

tkornuta-ibm commented 5 years ago

This pull request introduces 9 alerts when merging b169acdb371d3210aa967ab71f44e717bebd3f6a into 59d4419e89c9d55609e946eaf157c714ba3b9264 - view on LGTM.com

new alerts:


Comment posted by LGTM.com

tkornuta-ibm commented 5 years ago

This pull request introduces 9 alerts when merging 7b1699e9ed2d5e34bd75f59c22cd241a9dddbab6 into 59d4419e89c9d55609e946eaf157c714ba3b9264 - view on LGTM.com

new alerts:


Comment posted by LGTM.com

tkornuta-ibm commented 5 years ago

This pull request introduces 7 alerts when merging 608a47660ee42f3927976be20ad431dd079bb3d3 into 59d4419e89c9d55609e946eaf157c714ba3b9264 - view on LGTM.com

new alerts:


Comment posted by LGTM.com

tkornuta-ibm commented 5 years ago

This pull request introduces 10 alerts when merging 79eb99502ea2e0bb0cff81f852f6ec4895b8647f into 59d4419e89c9d55609e946eaf157c714ba3b9264 - view on LGTM.com

new alerts:


Comment posted by LGTM.com