AccelerationConsortium / ac-microcourses

Microcourses hosted by the Acceleration Consortium for self-driving lab topics.
https://ac-microcourses.readthedocs.io/
MIT License
24 stars 3 forks source link

Difficulty with microcourse 1.5, AWS integration with MongoDB #108

Open programlich opened 1 month ago

programlich commented 1 month ago

For me, this course was one of the more difficult one, because until now, I was completely unfamiliar with any kind of AWS services. This makes the learning curve quite steep, which is nice, but I wonder, if it is really worth it, focusing so much energy on something which is practically only a workaround. To give you an example of the level of knowledge I started with and what kind of errors I made, I'll give you a collection of aspects, which I needed to learn because I was doing them wrong in the first place:

programlich commented 1 month ago

Modify main.py

from netman import connectWiFi
import urequests
from my_secrets import SSID, PASSWORD, COURSE_ID, AWS_API_GATEWAY_URL, AWS_API_KEY
import time
import json

connectWiFi(SSID, PASSWORD, country="US")

headers = {"x-api-key": AWS_API_KEY}
document = {
            "body": json.dumps({"course_id": COURSE_ID})
            }

print(f"Sending document to AWS Lambda")

num_retries = 3
for _ in range(num_retries):
    response = urequests.post(AWS_API_GATEWAY_URL, headers=headers, json=document)
    txt = str(response.text)
    status_code = response.status_code

    if status_code != 200:
        print("Retrying in 5 seconds...")
        time.sleep(5)

    print(f"Response: ({status_code}), msg = {txt}")

    response.close()

    if status_code == 200:
        print("Added Successfully")
        break

    print("Retrying...")
programlich commented 1 month ago

Here a loose collection of screenshots about steps, which I found to be important, because I had not done these properly during my first try:

lambda1

lambda2

lambda3

lambda4

lambda5

programlich commented 1 month ago

I am still thinking about, how the setup of this course could be simplified, even though the data api of mongodb is now disabled. Two ideas, I have now in my mind:

  1. Make a collection available for every user of the microcourse in a central mongodb provided by AC. For course 1.5 then only the connection credentials for the AWS Gateway would need to be provided and the student could focus on the actual communication and data handling. Drawback: Student does not learn, how to setup everything by himself. Pro: Less risk of loosing motivation and quicker success. I think this would be a fine way, considering, that this is still the hello world course
  2. Setup a selfhosted mongodb on a second raspberry pi. I have my own mongodb instance running on one of our institutes VMs and managing it via mongsh, is not really userfriendly in my opinion. But overall, if I had the choice between learning to selfhost my db and getting to know all these new tools and words around the AWS universe, selfhosting would probably be easier and less confusing, if the guide is well written. Basically, its installing mongodb on the pi, creating a user, creating a database and creating a connection string. The biggest drawback here is, that this Pi would have to be reachable from the internet, which is a security issue when people play around at home. On the other hand: If they have all their communication running in their local network, this would not be a problem
sgbaird commented 1 month ago

Thanks for the great suggestions. Definitely a lot going on here.

Make a collection available for every user of the microcourse in a central mongodb provided by AC. For course 1.5 then only the connection credentials for the AWS Gateway would need to be provided and the student could focus on the actual communication and data handling. Drawback: Student does not learn, how to setup everything by himself. Pro: Less risk of loosing motivation and quicker success. I think this would be a fine way, considering, that this is still the hello world course

Reasonable approach. Might try this out. Alternative would be to drop this idea of direct microcontroller to database uploading and instead focus on handling data from the orchestrator.

Maybe you could also look over https://github.com/AccelerationConsortium/ac-microcourses/issues/45 and lmk if you have any follow up thoughts?

sgbaird commented 1 month ago

If there were a step by step video walkthrough for the AWS and MongoDB setup for module 1.5, do you think that would help address this?

programlich commented 1 month ago

Yes, this would definitely be very helpful, at least to a complete newcomer like me!

programlich commented 1 month ago

If I get the final step of the course (1.6) right, the mongodb is only part of the setup for logging purposes. All the communication between Pico and Orchestrator run via the MQTT Broker. So I am wondering, if for the sake of simplicity in the Hello World course, we could just move the logging commands to the orchestrator. This way, a comfortable way of logging via pymongo would be possible and the overall result would be the same. The more advanced communication via AWS Gateway could then be introduced in course 2 or 3.

sgbaird commented 4 weeks ago

If I get the final step of the course (1.6) right, the mongodb is only part of the setup for logging purposes. All the communication between Pico and Orchestrator run via the MQTT Broker. So I am wondering, if for the sake of simplicity in the Hello World course, we could just move the logging commands to the orchestrator. This way, a comfortable way of logging via pymongo would be possible and the overall result would be the same. The more advanced communication via AWS Gateway could then be introduced in course 2 or 3.

I struggled a lot with this decision during the early implementation period, and I think given the deprecation of the MongoDB Data API, this may be the best route. Moving the AWS Lambda piece to course 3 (robotics) or 4 (software dev.) may make more sense. In particular, there are two major reasons for keeping the AWS Lambda piece:

  1. Gaining exposure to AWS in general, and how to have small bits of Python code that can run on-demand without needing to have a dedicated machine for it
  2. Show people what they need for streaming time-series data to a database that doesn't depend on an orchestrator running 24/7. In particular, things like temperature/pressure/humidity/gas (https://github.com/AccelerationConsortium/ac-training-lab/issues/13), accelerometer data, or streaming the weight data from a scale, are all potentially useful things to track without having an actively running experimental campaign. "Lab metadata" in a sense.

Like you said, maybe it's too much to have in the hello world course. I think we'll try first with a walkthrough video and reassess.

sgbaird commented 3 weeks ago

After creating a walkthrough, let's consider moving the "direct upload from microcontroller" piece to https://ac-microcourses.readthedocs.io/en/latest/courses/software-dev/4.8-cloud-server.html#serverless-computing, focused on the example of using a temperature/pressure/humidity/gas sensor.

sgbaird commented 1 week ago

We're planning to make a video walkthrough for now. https://github.com/AccelerationConsortium/ac-microcourses/issues/132

Someone else also reported having issues: https://github.com/AccelerationConsortium/ac-microcourses/discussions/117.