U-Alberta / ADaPT-ML

MIT License
6 stars 2 forks source link

Consider incorporating a tutorial which walks the user through using the example data #11

Open wincowgerDEV opened 2 years ago

wincowgerDEV commented 2 years ago

I have gotten to step 2 in the usage guidelines. At this point the guidelines become all about calibrating configuration files but the user is not familiar with the basics of how any of these applications work or what the workflow experience will be like and as a non-expert in machine learning I find it challenging still to see how everything is going to fold together. I would recommend at this point recommending that the user walk through the example usage on the main page https://github.com/U-Alberta/ADaPT-ML. However, when I started to walk through it starting with Step 2: create a gold dataset using Label Studio, where the first script is, the code does not make my project in label studio reflect what is presented in the example usage. This may require some reorganization of the example files and configuration files so that the user can rapidly get the example working.

nulberry commented 2 years ago

I've added two things to the documentation in a6ca9df8c4f7f67e19e5f1dd5d393b6d05baf023:

wincowgerDEV commented 2 years ago

I will give it a shot

On Thu, Mar 3, 2022 at 2:26 PM nulberry @.***> wrote:

I've added two things to the documentation in a6ca9df https://github.com/U-Alberta/ADaPT-ML/commit/a6ca9df8c4f7f67e19e5f1dd5d393b6d05baf023 :

  • a note about following the example use case after reviewing system requirements
  • specific (but brief) instructions on how the example project was set up using the Label Studio UI. Following the UI is the easiest, most user-friendly way to init the project IMO.

— Reply to this email directly, view it on GitHub https://github.com/U-Alberta/ADaPT-ML/issues/11#issuecomment-1058568404, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMUJU2T2LJ5NPRQHA7HQ7TU6E37TANCNFSM5PS2A7JQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

--

´¯·.¸¸.·´¯·.´¯·.¸¸.·´¯ツ ------------------------------

Win Cowger, PhD Pronouns: he/him Research Scientist Moore Institute for Plastic Pollution Research

Contact Info

515-298-3869 | @.*** | @Win_OpenData https://twitter.com/Win_OpenData

Websites Personal Website: www.wincowger.com Currently Employed: https://mooreplasticresearch.org/ Alumni Of: https://www.thegraylab.org/ Project Websites: www.openspecy.org Research Gate: https://www.researchgate.net/profile/Win-Cowger Github: https://github.com/wincowgerDEV OSF: https://osf.io/kxeh5/

wincowgerDEV commented 2 years ago

I like that you have the note about following the example use case mentioned now. When I go to the example use case and start with Step 1, I can't find a clear path for following along with the tutorial. For example, where is the data stored that I am supposed to import to CrateDB? How do I get it into CrateDB? I tried to skip ahead to step 2 but ran into similar questions. I ran this command "docker exec label-studio-dev python ./ls/sample_tasks.py example_data txt 30 example --filename example_tasks.json" and read that the data should be in $LS_TASKS_PATH but I do not see that populated on my computers directories, perhaps it is in the docker container? If so, how would one access it to load it into the label studio?

wincowgerDEV commented 2 years ago

BTW thanks for adding in all this newbie stuff for newbies like me, I am learning a lot and could likely see myself using this tool in the future.

wincowgerDEV commented 2 years ago

I think I may have found the example data from step 2, is this right? If so, you might want to change $LS_TASKS_PATH to example_data\ls\tasks or maybe specify where the $LS_TASKS_PATH command is supposed to be sent if that is what it is. image

nulberry commented 2 years ago

In c07ec8455e78448de593d32ccc8a2c222e15a3fe, I've added a note in the Example Use Case section for users who, in addition to reading over the steps, would like to follow along on their machine like you are doing. It clarifies that the file paths specified by environment variables like $LS_TASKS_PATH are in the .env file, so that they can reference where things are happening.

The example data stored in CrateDB is in ./crate relative to the repo root. I found writing Step 1 for the Example Use Case a bit tricky because pretty much everything in this step has already happened behind the scenes. ADaPT-ML provides CrateDB, but there are no scripts or programs that take a user's data and create a table in CrateDB for them. I did not make this a part of ADaPT-ML because I was unsure of how to handle the many possibilities for data types, file formats, etc., with one program. All of this to say, do you think that it would be good for me to mention somehow that there isn't a way to "follow along" with Step 1 besides accessing the CrateDB UI and looking at the example data table?

There's also the option for me to describe the way I loaded the data into CrateDB as a suggestion within the usage guidelines, just so that users have something to work off of when they are deciding how they want to featurize and load their data into CrateDB. Right now, this is what I have under Step 6 of the usage guidelines:

Then it's ready! Import your data into a table in CrateDB and refer to the Example Usage for an example of how to manipulate the data so that it's ready for ADaPT-ML. How you load the data, featurize it, and sample from it to create your unlabeled training data is up to you -- ADaPT-ML does not perform these tasks. However, there may be an opportunity for certain sampling methods to become a part of the system; see Contributing.

And yes that's right, $LS_TASKS_PATH corresponds to ./example_data/ls/tasks relative to the root of the repo. All files created within the Docker containers are on shared volumes with the host machine so that nothing gets lost if the containers are stopped. It took me a while to wrap my head around Docker :laughing:, but it certainly saves a lot of time and headaches down the road for large systems like ADaPT-ML!

wincowgerDEV commented 2 years ago

Thanks for this :)

Some responses below: "All of this to say, do you think that it would be good for me to mention somehow that there isn't a way to "follow along" with Step 1 besides accessing the CrateDB UI and looking at the example data table?"

True that about Docker, still takes me a while to figure out what is going on but you are demonstrating its power in a big way with your tool here.

Warm Regards, Win

On Tue, Mar 8, 2022 at 2:44 PM nulberry @.***> wrote:

In c07ec84 https://github.com/U-Alberta/ADaPT-ML/commit/c07ec8455e78448de593d32ccc8a2c222e15a3fe, I've added a note in the Example Use Case section for users who, in addition to reading over the steps, would like to follow along on their machine like you are doing. It clarifies that the file paths specified by environment variables like $LS_TASKS_PATH are in the .env file, so that they can reference where things are happening.

The example data stored in CrateDB is in ./crate relative to the repo root. I found writing Step 1 for the Example Use Case a bit tricky because pretty much everything in this step has already happened behind the scenes. ADaPT-ML provides CrateDB, but there are no scripts or programs that take a user's data and create a table in CrateDB for them. I did not make this a part of ADaPT-ML because I was unsure of how to handle the many possibilities for data types, file formats, etc., with one program. All of this to say, do you think that it would be good for me to mention somehow that there isn't a way to "follow along" with Step 1 besides accessing the CrateDB UI and looking at the example data table?

There's also the option for me to describe the way I loaded the data into CrateDB as a suggestion within the usage guidelines, just so that users have something to work off of when they are deciding how they want to featurize and load their data into CrateDB. Right now, this is what I have under Step 6 of the usage guidelines:

Then it's ready! Import your data into a table in CrateDB and refer to the Example Usage https://github.com/U-Alberta/ADaPT-ML/blob/main/README.md for an example of how to manipulate the data so that it's ready for ADaPT-ML. How you load the data, featurize it, and sample from it to create your unlabeled training data is up to you -- ADaPT-ML does not perform these tasks. However, there may be an opportunity for certain sampling methods to become a part of the system; see Contributing https://github.com/U-Alberta/ADaPT-ML/blob/main/usage.md#community-guidelines .

And yes that's right, $LS_TASKS_PATH corresponds to ./example_data/ls/tasks relative to the root of the repo. All files created within the Docker containers are on shared volumes with the host machine so that nothing gets lost if the containers are stopped. It took me a while to wrap my head around Docker 😆, but it certainly saves a lot of time and headaches down the road for large systems like ADaPT-ML!

— Reply to this email directly, view it on GitHub https://github.com/U-Alberta/ADaPT-ML/issues/11#issuecomment-1062190969, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMUJU46WOYOZFMGP7H4NKDU6632NANCNFSM5PS2A7JQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

--

´¯·.¸¸.·´¯·.´¯·.¸¸.·´¯ツ ------------------------------

Win Cowger, PhD Pronouns: he/him Research Scientist Moore Institute for Plastic Pollution Research

Contact Info

515-298-3869 | @.*** | @Win_OpenData https://twitter.com/Win_OpenData

Websites Personal Website: www.wincowger.com Currently Employed: https://mooreplasticresearch.org/ Alumni Of: https://www.thegraylab.org/ Project Websites: www.openspecy.org Research Gate: https://www.researchgate.net/profile/Win-Cowger Github: https://github.com/wincowgerDEV OSF: https://osf.io/kxeh5/

nulberry commented 2 years ago

Thank you for your suggestions. To give a general idea of how users can import their data, I have added the Python script I used to import the example data, with some inline comments to give a bit of detail on LF and ML featurization. It is example_data/example_data_import.py. It is not intended to be run as that would require setting up a virtual environment with extra dependencies and it is outside the current scope of ADaPT-ML, but I linked to it in the parts of the docs you've quoted so it can be read over as users are following along. I didn't edit the gitignore properly on the first go, so you will find these changes in 34cc9744b7e297e7418152dab1c523b69d4a91ca and 02d87a22e71a6614c6c1240c64d22c54f3df7c41. Thanks again for all of your help in improving my understanding of users' mental models.

wincowgerDEV commented 2 years ago

Thanks for these updates. The JOSS editor recommended that I make sure a minimal example of everything working is reproducible on my machine and I feel like we have achieved that at this point because I can see how myself or others would start to work with the ADaPT-ML framework. It sounds like some of my requests for a fully worked example are currently beyond the scope of the project. I feel comfortable recommending the manuscript be accepted and will do so unless there is something else you would like me to try out on my device.

On Thu, Mar 10, 2022 at 2:45 PM nulberry @.***> wrote:

Thank you for your suggestions. To give a general idea of how users can import their data, I have added the Python script I used to import the example data, with some inline comments to give a bit of detail on LF and ML featurization. It is example_data/example_data_import.py. It is not intended to be run as that would require setting up a virtual environment with extra dependencies and it is outside the current scope of ADaPT-ML, but I linked to it in the parts of the docs you've quoted so it can be read over as users are following along. I didn't edit the gitignore properly on the first go, so you will find these changes in 34cc974 https://github.com/U-Alberta/ADaPT-ML/commit/34cc9744b7e297e7418152dab1c523b69d4a91ca and 02d87a2 https://github.com/U-Alberta/ADaPT-ML/commit/02d87a22e71a6614c6c1240c64d22c54f3df7c41. Thanks again for all of your help in improving my understanding of users' mental models.

— Reply to this email directly, view it on GitHub https://github.com/U-Alberta/ADaPT-ML/issues/11#issuecomment-1064583253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMUJU4ZTD7OQNYGKUQVPF3U7J3PVANCNFSM5PS2A7JQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

--

´¯·.¸¸.·´¯·.´¯·.¸¸.·´¯ツ ------------------------------

Win Cowger, PhD Pronouns: he/him Research Scientist Moore Institute for Plastic Pollution Research

Contact Info

515-298-3869 | @.*** | @Win_OpenData https://twitter.com/Win_OpenData

Websites Personal Website: www.wincowger.com Currently Employed: https://mooreplasticresearch.org/ Alumni Of: https://www.thegraylab.org/ Project Websites: www.openspecy.org Research Gate: https://www.researchgate.net/profile/Win-Cowger Github: https://github.com/wincowgerDEV OSF: https://osf.io/kxeh5/

nulberry commented 2 years ago

I can't think of anything specific that needs attention. I am still trying to get the proper Docker daemon running on the GitHub hosted Windows runner, but the runner seems to me to be very unlike any setup that the average user would have, so I think that this work can carry on separately as it's more of an issue with trying to customize the default Docker installation on the runner than an issue with the system functionality.

wincowgerDEV commented 2 years ago

Yeah, there are always more things we can do with software dev :). Looking forward to seeing how this evolves in the future. Let me know if you need anything in the future I will be happy to help even after the review is done. I will mark my part done for now in the review repo.

Warm Regards, Win

On Mon, Mar 14, 2022 at 11:35 AM nulberry @.***> wrote:

I can't think of anything specific that needs attention. I am still trying to get the proper Docker daemon running on the GitHub hosted Windows runner, but the runner seems to me to be very unlike any setup that the average user would have, so I think that this work can carry on separately as it's more of an issue with trying to customize the default Docker installation on the runner than an issue with the system functionality.

— Reply to this email directly, view it on GitHub https://github.com/U-Alberta/ADaPT-ML/issues/11#issuecomment-1067157073, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMUJU56RJV4NN6XD37KADLU76BHFANCNFSM5PS2A7JQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

--

´¯·.¸¸.·´¯·.´¯·.¸¸.·´¯ツ ------------------------------

Win Cowger, PhD Pronouns: he/him Research Scientist Moore Institute for Plastic Pollution Research

Contact Info

515-298-3869 | @.*** | @Win_OpenData https://twitter.com/Win_OpenData

Websites Personal Website: www.wincowger.com Currently Employed: https://mooreplasticresearch.org/ Alumni Of: https://www.thegraylab.org/ Project Websites: www.openspecy.org Research Gate: https://www.researchgate.net/profile/Win-Cowger Github: https://github.com/wincowgerDEV OSF: https://osf.io/kxeh5/

nulberry commented 2 years ago

Thank you so much, that means a lot! I am grateful for all of your contributions to this project :smile: