UF-Carpentry / Coordination

Documentation and material for instructors and organizers
5 stars 3 forks source link

Machine learning workshop with the USDA Agricultural research service #99

Closed arivers closed 5 years ago

arivers commented 5 years ago

On April 30 I met with the board about teaming upr to put together a 2-day machine learning workshop based on Google's machine learning crash course:


We plan to have 20-30 USDA-ARS participants and provide 3-4 instructors.

We would want help from UF Carpentries in:

I am also planning to contact Andy Li at the NSF Center for Big Learning on campus about potential partnerships.

ha0ye commented 5 years ago

@arivers If it's alright with you, I can try and put the word out about your workshop to the UF Informatics-Training listserv to see if there are interested folks.

Also, do you know if you'll need/want HPC support during the workshop? (sorry if this is covered in the course content - I haven't taken a look at that yet)

arivers commented 5 years ago

@ha0ye , Yes feel free to put the information on the informatics training listserv.

I don't think we will need HPC support. Most of the datasets for training are smaller.

MarconiS commented 5 years ago

Hi @arivers I am interested in participating (either as instructor or helper), but I have quite a lot going on this summer, and hence am wondering how much time commitment do you all expect needing to build the material. At first glance that's a LOT of material to cover in 2 days. Thanks!

stuckyb commented 5 years ago

I agree about not needing HPC support. My feeling is that if this workshop requires HPC resources, we're probably doing it wrong!

Also, I'd be interested in helping out with this workshop, either as instructor or helper.

arivers commented 5 years ago

Hi @MarconiS,

I think there are several levels at which someone could participate.

The most involved would be as a co-creator, working with me on modifying the lessons, testing out the material and organizing the workshop.

The next most involved would be as a co-instructor, teaching a chunk of the course, doing the live coding, etc.

The least time commitment would be acting as a mentor during the workshop going around helping students do the exercises and answering questions.

gklarenberg commented 5 years ago

@arivers I'd be interested in helping out, but also have some travel going on this summer. Do you have an idea of when to do this workshop?

MarconiS commented 5 years ago

Gotcha! More than happy at least to be involved as a co-instructor; will be glad to carve some time out for assisting you all as co-creator, if you'll need me :)

gaurav commented 5 years ago

I'm also happy to help out in any way I can!

hugedata commented 5 years ago

Hi @arivers,

I'd be interested in co-creating parts of the course and also as an instructor.

Do you have a time-frame for when the course will take place?

Best, Dimitri

arivers commented 5 years ago

We do not have a date set but were thinking of late August.

hugedata commented 5 years ago

The Fall semester starts very early - August 20. Depends on when it will be easier to reserve space on campus, before or after the semester starts.

kokbent commented 5 years ago

Happy to help out on this.

While this shouldn't need to use HPC, but it is also important to have a good machine to work on. Often an 8-year-old mid-spec laptop is not a good idea.


Nits11 commented 5 years ago

Hi @arivers , I will be interested in helping. Please intimate around what time you plan to schedule the workshop. Thanks Nitya

hugedata commented 5 years ago

Hi @kokbent,

IF all participants could have a HPC account this will make things much easier.

Tools like Keras and Tensorflow are already there, and even GPUs could be used. Of course participants should not run on the login nodes, but on the dedicated development nodes.

Nits11 commented 5 years ago

?This is easily possible. We have implemented similar settings in Genomics workshop just a month before.



From: Dimitri Bourilkov notifications@github.com Sent: Monday, May 13, 2019 6:32 PM To: UF-Carpentry/Coordination Cc: Singh,Nitya; Comment Subject: Re: [UF-Carpentry/Coordination] Machine learning workshop with the USDA Agricultural research service (#99)

Hi @kokbenthttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_kokbent&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=Smi-NuZIOhtQI6EbBij-Fg&m=16GPOxaxVBlwAjl0s4B42SsQ1xf0RLbfBIwU4ZY9M3g&s=s8-0PuNd0frK8R1dGxpXpOoKoVJ68zcAX9bJmBBbwbY&e=,

IF all participants could have a HPC account this will make things much easier. Tools like Keras and Tensorflow are already there, and even GPUs could be used. Of course participants should not run on the login nodes, but on the dedicated development nodes.

Best, Dimitri

- You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_UF-2DCarpentry_Coordination_issues_99-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAJQRCKP2GETA4QXXSJCFSFDPVHT7ZA5CNFSM4HKHFYX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVJX3PI-23issuecomment-2D492010941&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=Smi-NuZIOhtQI6EbBij-Fg&m=16GPOxaxVBlwAjl0s4B42SsQ1xf0RLbfBIwU4ZY9M3g&s=H98wlMOB6mYLg8usCvBDvfTkXJJaLNMEmvgsNGlacM0&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AJQRCKLNKMGTPYJCIOC6CDDPVHT7ZANCNFSM4HKHFYXQ&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=Smi-NuZIOhtQI6EbBij-Fg&m=16GPOxaxVBlwAjl0s4B42SsQ1xf0RLbfBIwU4ZY9M3g&s=NWtkFXWWU4fAr6OwwJvReNeELHW6C4KNCsKJ0l-pXho&e=.

kokbent commented 5 years ago

Hi @kokbent, IF all participants could have a HPC account this will make things much easier.

Tools like Keras and Tensorflow are already there, and even GPUs could be used. Of course participants should not run on the login nodes, but on the dedicated development nodes. Best, Dimitri

Yes this should be some of the first few issues to discuss. I've skimmed through the Google Crash Course, and it seems like the "programming exercise" is done in "colaboratory", which is a decently-skin jupyter notebook hosted somewhere in google's server and interfacing with Google drive. It seems to be free and it's what they use to run tensorflow and even training of neural network. This could be an attractive option because all we need is a google account and a browser.

hugedata commented 5 years ago

It's a good starting point. I have done the whole Google Tensorflow crashcourse, and it was instructive to run the examples from Python as well. The HPC also provides options to run Jupyter notebooks remotely, so this could be set-up if needed.

andorfc commented 5 years ago


I would be happy to help review the course material and act as mentor if needed.


sunray1 commented 5 years ago

I'd be glad to help too if you need anything!

arivers commented 5 years ago

@ha0ye Do you know who would be best person to help me look for rooms and dates that are available on campus? I was thinking of doing a larger workshop with about 45-50 participants sometime in August or early September. I know the library had a room but I think it was for smaller groups. We could also rent space at Emerson hall potentially. Also, are there are dates that would be good to avoid based on the UF calendar?

ha0ye commented 5 years ago

@arivers Flora and Alethea (whose emails you should have) would be good to go through first. I'm not sure what the spaces are like in the library, but we were able to reserve some of the CSE lab spaces / classrooms last summer that were big enough.

magitz commented 5 years ago

Kind of late in joining this, but I'd be happy to help co-instruct this. Has there been progress on working on the curriculum? Might be able to help with that if needed.


arivers commented 5 years ago

We have a tentative date and location for the course of August 27-28 at the Reitz Union.

I would like to meet Tuesday, June 11, 2019 at 10:00 AM at the USDA-ARS Lab 1600 SW 23 Dr, Gainesville, FL 32608 to discuss proposed changes to the Google course curriculum for our workshop. Park anywhere onsite and stop by the small administration building (No. 30) between the two large brick buildings to be directed to the conference room.

You can also join by Webex Meeting number: 961 217 196 https://ars-usda.webex.com/ars-usda/j.php?MTID=m2f4e9d3596cc6ae20546cf6ff66549e5 Join by phone 1-888-8449904 Call-in toll-free number (ATT Audio Conference) 1-816-4234261 Call-in number (ATT Audio Conference) 481 288 6 Access Code

Key topics discussed will be:

For reference the link to the Google course is here: https://developers.google.com/machine-learning/crash-course/

Please comment if you intend to come or if you want to participate but cannot come.

The proposed schedule for the course is below.

August 27

Time Lesson
08:00 Framing ML problems
8:20 Getting started linear regression and loss
8:40 Reducing loss: iteration, gradient descent, learning rate, Stochastic gradient descent
9:40 Getting started with Tensorflow and Scikit Learn
10:40 Generalization and the Variance bias tradeoff
11:00 Break
11:30 Splitting Training and test data sets
12:00 Lunch Break (participants purchase their own meals)
1:30 Validation data sets
2:15 Representations: feature selection, engineering, data cleaning
3:15 Feature crosses: encoding non-linearity
4:15 Regularization: simplicity (L2)
5:15 Adjourn

August 28

Time Lesson
08:00 Logistic regression
8:30 Classification: Thresholding
8:40 Classification: True vs. False and Positive vs. Negative
8:50 Classification: Accuracy
9:00 Classification: Precision and Recall
9:15 Classification: Precision Recall & Receiver Operating Characteristic (ROC) curves
9:45 Classification: Prediction Bias
10:00 Regularization: Sparsity (L1)
11:00 Neural Networks
12:00 Lunch Break (participants purchase their own meals)
1:30 Training Neural Nets
2:30 Embeddings
3:30 Overview of Methods: Classification, Regression, Clustering, Dim. Reduction
4:30 Resources for doing ML in your lab when you leave
5:00 Answering final questions
5:30 Adjourn
stuckyb commented 5 years ago

I'll be there!

Nits11 commented 5 years ago

Hi Adam ,

Thanks for sharing the proposed schedule which looks great.

Unfortunately, I have our weekly lab meeting on the proposed time.

I would like to follow up this meeting notes and joining in the next meeting.



From: Adam Rivers notifications@github.com Sent: Tuesday, June 4, 2019 3:19 PM To: UF-Carpentry/Coordination Cc: Singh,Nitya; Comment Subject: Re: [UF-Carpentry/Coordination] Machine learning workshop with the USDA Agricultural research service (#99)

We have a tentative date and location for the course of August 27-28 at the Reitz Union.

I would like to meet Tuesday, June 11, 2019 at 10:00 AM at the USDA-ARS Lab 1600 SW 23 Dr, Gainesville, FL 32608https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.com_maps_place_US-2BAgricultural-2BDepartment_-4029.6265374-2C-2D82.3560809-2C15z_data-3D-214m8-211m2-212m1-211sUSDA-2BARS-213m4-211s0x88e8a313caebf35d-3A0xc6fd6891df3610f8-218m2-213d29.6352168-214d-2D82.3596437&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=Smi-NuZIOhtQI6EbBij-Fg&m=Z1ZvpZBhkn_9wdEcyNcdcO6jwluthzuA2cmUreL1v3I&s=BULfR3TDKGRRSpJg7fEwlFRV9DpDcObwRzHJkxXGwOc&e= to discuss proposed changes to the Google course curriculum for our workshop. Park anywhere onsite and stop by the small administration building (No. 30) between the two large brick buildings to be directed to the conference room.

You can also join by Webex Meeting number: 961 217 196 https://ars-usda.webex.com/ars-usda/j.php?MTID=m2f4e9d3596cc6ae20546cf6ff66549e5https://urldefense.proofpoint.com/v2/url?u=https-3A__ars-2Dusda.webex.com_ars-2Dusda_j.php-3FMTID-3Dm2f4e9d3596cc6ae20546cf6ff66549e5&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=Smi-NuZIOhtQI6EbBij-Fg&m=Z1ZvpZBhkn_9wdEcyNcdcO6jwluthzuA2cmUreL1v3I&s=UGLtFzaPE37CvP00rX8mHDeU-7YIzgb7slPEeZJatos&e= Join by phone 1-888-8449904 Call-in toll-free number (ATT Audio Conference) 1-816-4234261 Call-in number (ATT Audio Conference) 481 288 6 Access Code

Key topics discussed will be:

For reference the link to the Google course is here: https://developers.google.com/machine-learning/crash-course/https://urldefense.proofpoint.com/v2/url?u=https-3A__developers.google.com_machine-2Dlearning_crash-2Dcourse_&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=Smi-NuZIOhtQI6EbBij-Fg&m=Z1ZvpZBhkn_9wdEcyNcdcO6jwluthzuA2cmUreL1v3I&s=UH6c8KFmu3isL8UkutFgpohwuRoEQl23qc9ZRmOhmeU&e=

Please comment if you intend to come or if you want to participate but cannot come.

The proposed schedule for the course is below.

August 27

Time Lesson 08:00 Framing ML problems 8:20 Getting started linear regression and loss 8:40 Reducing loss: iteration, gradient descent, learning rate, Stochastic gradient descent 9:40 Getting started with Tensorflow and Scikit Learn 10:40 Generalization and the Variance bias tradeoff 11:00 Break 11:30 Splitting Training and test data sets 12:00 Lunch Break (participants purchase their own meals) 1:30 Validation data sets 2:15 Representations: feature selection, engineering, data cleaning 3:15 Feature crosses: encoding non-linearity 4:15 Regularization: simplicity (L2) 5:15 Adjourn

August 28

Time Lesson 08:00 Logistic regression 8:30 Classification: Thresholding 8:40 Classification: True vs. False and Positive vs. Negative 8:50 Classification: Accuracy 9:00 Classification: Precision and Recall 9:15 Classification: Precision Recall & Receiver Operating Characteristic (ROC) curves 9:45 Classification: Prediction Bias 10:00 Regularization: Sparsity (L1) 11:00 Neural Networks 12:00 Lunch Break (participants purchase their own meals) 1:30 Training Neural Nets 2:30 Embeddings 3:30 Overview of Methods: Classification, Regression, Clustering, Dim. Reduction 4:30 Resources for doing ML in your lab when you leave 5:00 Answering final questions 5:30 Adjourn

- You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_UF-2DCarpentry_Coordination_issues_99-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAJQRCKIBZJHTXOYBBGIF4PDPY255TA5CNFSM4HKHFYX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW5TAKA-23issuecomment-2D498806824&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=Smi-NuZIOhtQI6EbBij-Fg&m=Z1ZvpZBhkn_9wdEcyNcdcO6jwluthzuA2cmUreL1v3I&s=49G0O9uGhaLzJQp0SYaFPIj3Nx0ppmh2PjEKI963inI&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AJQRCKPHXQGEOW2BU276JKTPY255TANCNFSM4HKHFYXQ&d=DwMCaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=Smi-NuZIOhtQI6EbBij-Fg&m=Z1ZvpZBhkn_9wdEcyNcdcO6jwluthzuA2cmUreL1v3I&s=MIQyr-sSnM-a4k2TUYeveHCvomuXaDVkCRoxF8q9FNc&e=.

hugedata commented 5 years ago

Hi Adam,

I am planning to attend.

Best, Dimitri

arivers commented 5 years ago

As a reminder, we are meeting at 10:00AM today at the USDA-ARS Lab 1600 SW 23 Dr, Gainesville, FL 32608 to discuss proposed changes to the Google course curriculum for our workshop. Park anywhere onsite and stop by the small administration building (No. 30) between the two large brick buildings to be directed to the conference room.

You can also join by Webex Meeting number: 961 217 196 https://ars-usda.webex.com/ars-usda/j.php?MTID=m2f4e9d3596cc6ae20546cf6ff66549e5 Join by phone 1-888-8449904 Call-in toll-free number (ATT Audio Conference) 1-816-4234261 Call-in number (ATT Audio Conference) 481 288 6 Access Code

For the next meeting we will schedule a time that works for all interested people.

kokbent commented 5 years ago

Could not join the discussions because I need to attend an interview, will there be a brief minutes about the meeting?

arivers commented 5 years ago

Yes, I'll post a summary and more information shortly.

On Tue, Jun 11, 2019 at 2:26 PM kokbent notifications@github.com wrote:

Could not join the discussions because I need to attend an interview, will there be a brief minutes about the meeting?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/UF-Carpentry/Coordination/issues/99?email_source=notifications&email_token=AACZ6VIXE3TPIADG5AOUK4TPZ7U6LA5CNFSM4HKHFYX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXOCFRQ#issuecomment-500966086, or mute the thread https://github.com/notifications/unsubscribe-auth/AACZ6VJ2CNLX7AYKK76DMX3PZ7U6LANCNFSM4HKHFYXQ .

-- Adam Rivers PGP Public Encryption Key https://keys.mailvelope.com/pks/lookup?op=get&search=0x63ABF3A7121737F2

arivers commented 5 years ago

I'm moving this corrdination discussion to a repository specific to the ml training class so we can divide up issues into multiple threads. The new repo is here:


Please follow that Repository to get updates. The results of our first meeting are on that repository in the issues and wiki sections.

arivers commented 5 years ago

We have set up a site and curriculum for the ML course: https://usda-ars-gbru.github.io/ml-training-site/ on August 27-28.

We are still looking for a few more people who are interested in helping or teaching a small module for the course. Please let me know if you are interested and fill out this doodle pool of potential times you could meet over the next month: https://doodle.com/poll/wtbzz4mafgnciffx . Even if you cannot meet there are other ways to get involved.