adafruit / circuitpython-org

CircuitPython's website
https://circuitpython.org
154 stars 571 forks source link

Board-specific or project-specific bundles #491

Closed siddacious closed 3 years ago

siddacious commented 4 years ago

It would be awesome to be able to download a mini-bundle that includes all the libraries that are available for a given board.

As an example the download page for the Feather Bluefruit Sense would have a mini bundle with

The same for other boards with on-board sensors that we have libraries for. We'd have to have a way to know which sensors were included with a board. To be really hacky about it, we could grep the .md file for a board for sensor names that we support and build a bundle based off of that.

dhalbert commented 4 years ago

Instead of pre-made bundles, suppose we had a web service on circuitpython.org that would bundle up a set of named libraries into a .zip file. The URL would specify the list of libraries. We could then use such URL's in Leanr Guides to download exactly what was needed for a particular project (or board). The guide author could easily construct the URL's to include what was desired.

Conceptually, something like (not real URL's):

https://circuitpython.org/library-zip?libs=lsm6ds,lis3mdl,apds9960,sht31d,bmp280,register,bus_device&project=sense_board&version=5.x

It would dynamically pick out the latest version of those libs and give you a zip file: circuitpython-libraries-sense-board-20200601-5.x.zip.

A user could also create their own bundle dynamically: there could be a table listing all the libraries, with check boxes to choose what you want. There'd be a check box at the top to select all.

Maybe the guide zip links would just go to the bundle generation page and pre-check the selected libraries. Then the user could select the version and customize the choices as needed.

Cloudfront or something could cache them so we don't spend a lot of time regenerating the popular bundles on the back end.

@kattni @ladyada @tannewt Watchya think?

dglaude commented 4 years ago

I think I discussed that need "in the weed" a few month ago where I suggested bundle easy for CLUE or other board full of sensor. There was a consensus that there was something nice to have, but maybe no followup.

That could be used in learn guide too, where my pain is to gather all the needed file from the latest bundle. And also because there are folder and files into the root and my operating system sort separatly folder (on top) and file (at the bottom). So practicaly, every lib I need, I need to seach twice (except when they guide show clearly if it is a file or a folder).

I don't care about the implementation, but I like @dhalbert solution, especially if it is easy to create your own bundle for personal / not published / not published on learn project.

tannewt commented 4 years ago

@dhalbert Sounds good to me.

makermelissa commented 4 years ago

The main issue with building bundles on the fly is that we use Jekyll, which builds a static website ahead of time when any changes are added to the repository. This would be trivial in something like PHP, but in order to use it with our current setup, we would need to have pre-defined bundles or use some external service.

v923z commented 4 years ago

The main issue with building bundles on the fly is that we use Jekyll, which builds a static website ahead of time when any changes are added to the repository. This would be trivial in something like PHP, but in order to use it with our current setup, we would need to have pre-defined bundles or use some external service.

In essence, @dhalbert is suggesting a dumb and minimal server that does nothing, but collects a handful of files from pre-defined locations (these .mpy files are on github, aren't they), and serves the zipped file to the user. You don't need external services for that. In fact, you could take something like flask, and then with 20 lines of python code you are done.

makermelissa commented 4 years ago

In essence, @dhalbert is suggesting a dumb and minimal server that does nothing, but collects a handful of files from pre-defined locations (these .mpy files are on github, aren't they), and serves the zipped file to the user. You don't need external services for that. In fact, you could take something like flask, and then with 20 lines of python code you are done.

I understand that. The website is static, so it doesn't run on Python and can't even run flask. Being able to specify in a URL and have it return the files in a zip without knowing which files ahead of time is a dynamic function and easy to do with server-side scripting languages. We do have JavaScript available, which is how most of the magic happens already, so it might work if we can leverage that.

v923z commented 4 years ago

I understand that. The website is static, so it doesn't run on Python and can't even run flask. Being able to specify in a URL and have it return the files in a zip without knowing which files ahead of time is a dynamic function and easy to do with server-side scripting languages. We do have JavaScript available, which is how most of the magic happens already, so it might work if we can leverage that.

So you want to do this on client-side?

dhalbert commented 4 years ago

I was assuming we might need to get the webdev folks like @jwcooper involved to host the dynamic part elsewhere in our web infrastructure. It doesn't necessarily have to run under GitHub pages, though perhaps it's possible to do with JavaScript fetching the components of the zip and then pasting it together.

makermelissa commented 4 years ago

Ah, ok. Yes, if we had the dynamic service running elsewhere, that would work too. There's a library called JSZip that could assemble it though with JavaScript.

v923z commented 4 years ago

I was assuming we might need to get the webdev folks like @jwcooper involved to host the dynamic part elsewhere in our web infrastructure.

From Melissa's comments I gather that, although the web site is static, it is generated anew each time you change something in the code base. But if it is so, then you could have the board definitions on client side (in the HTML file itself), they would always be up-to-date, and then the browser could cobble the parts together.

dhalbert commented 4 years ago

The list of library files could be in the static site, or it could be fetched from some other place. Then some Javascript could make AJAX calls to get the individual library files, and then use JSZip. I don't know if we can then present the zip file as something for the user to download. I hope so. (I always run into security considerations when looking at things like this.)

makermelissa commented 4 years ago

From Melissa's comments I gather that, although the web site is static, it is generated anew each time you change something in the code base. But if it is so, then you could have the board definitions on client side (in the HTML file itself), they would always be up-to-date, and then the browser could cobble the parts together.

Yes, that would be pre-defined bundles as originally suggested and that would be doable for sure. Dan was suggesting dynamically built bundles for learn guides and that's what I was mostly addressing.

The list of library files could be in the static site, or it could be fetched from some other place. Then some Javascript could make AJAX calls to get the individual library files, and then use JSZip. I don't know if we can then present the zip file as something for the user to download. I hope so. (I always run into security considerations when looking at things like this.)

Yeah, that would probably work.

dhalbert commented 4 years ago

A Javascript client solution is probably less attractive to produce Learn Guide bundles, since you can't pass args to process to circuitpython.org. The Javascript would have to get downloaded with the Guide page. @jwcooper do you have any comment on the best mechanism?

v923z commented 4 years ago

From Melissa's comments I gather that, although the web site is static, it is generated anew each time you change something in the code base. But if it is so, then you could have the board definitions on client side (in the HTML file itself), they would always be up-to-date, and then the browser could cobble the parts together.

Yes, that would be pre-defined bundles as originally suggested and that would be doable for sure. Dan was suggesting dynamically built bundles for learn guides and that's what I was mostly addressing.

Oh, what I meant is that the browser would hold a list of the file locations, and then would fetch those files based on the address. That is dynamical.

v923z commented 4 years ago

A Javascript client solution is probably less attractive to produce Learn Guide bundles, since you can't pass args to process to circuitpython.org. The Javascript would have to get downloaded with the Guide page. @jwcooper do you have any comment on the best mechanism?

Your original example, https://circuitpython.org/library-zip?libs=lsm6ds,lis3mdl,apds9960,sht31d,bmp280,register,bus_device&project=sense_board&version=5.x would simply return an HTML file, always the same, no matter what. The browser would then parse the address.

makermelissa commented 4 years ago

Your original example, https://circuitpython.org/library-zip?libs=lsm6ds,lis3mdl,apds9960,sht31d,bmp280,register,bus_device&project=sense_board&version=5.x would simply return an HTML file, always the same, no matter what. The browser would then parse the address.

Pretty much, except a JSON file would probably work better. ;) Then it would pass the relevant files into JSzip and return the result to the user.

v923z commented 4 years ago

Pretty much, except a JSON file would probably work better. ;)

Details. :)

Then it would pass the relevant files into JSzip and return the result to the user.

Right.

siddacious commented 4 years ago

Indeed I believe it was @dglaude that put this idea in my head a while back ;)

I for one love this approach! The idea of a web service that will return a bundle based on an arbitrary list of libs is great. I think it would also be useful to have a complementary web service for CRUD-ing named bundles with a static URL where the content of the URL would be kept up to date with the current version of the libs.

We could then provide an AJAX-ey UI for guide authors and people registering new boards to create a bundle for their thing and then provide them with a URL to link to in their guide or board definition.

Edit: It would be cool if the library-returning webservice (which I'll call the Librarian) could take an updated_since=YYYY-MM-DD parameter that only returns the subset of the requested libraries that have changed since the given date. The bundle CRUD webservice (which I'll call the Bundler for lack of a more clever name) could then use this parameter to keep the registered bundles up to date.

v923z commented 4 years ago

I have totally overlooked a fundamental issue: if the html file is stand-alone, and is supposed to get the data, how do you want to manage cross-origin requests? You need either a key from a provider, or your server has to allow that, and then you would have to host the data (files).

endico commented 3 years ago

Learning guides aren't the only place this would be used. I just released a small MagTag project on github with an eye toward making it as easy as possible for non-technical people to use. Currently there are ten libraries that need to be installed and with the refactoring I imagine that will increase. Instead of listing them all, I pointed people to the MagTag guide. It would be even nicer though if I could just give people a single link.

ladyada commented 3 years ago

yeah we would ideally use circup in some way. as our boards/projects get more complex, having it built into the IDE could be beneficial

makermelissa commented 3 years ago

Ok, so here's the thing. It doesn't appear we currently store the raw .py or .mpy files on amazon or elsewhere, only zipped vesions. I'm trying to figure out the best solution here, but would like input from others. Here's the options I see:

We can use the .zip files from the release assets from GitHub to build a bundle, but a zip file with more zip files will probably be annoying to the user because it's more work than just grabbing the bundle itself.

We can start storing .mpy and .py files on Amazon s3 somewhere like we do with all the circuitpython releases, but I'm not sure what are limits are and this may take up a bunch of room unless we just keep the latest versions of everything.

We can use a server-side web service like php that can extract the zip files and rebuild new ones on the fly, but the initial setup is a bit more complicated and I'm not sure of the best spot to keep this and maintain it.

I'm open to additional suggestions as well.

dhalbert commented 3 years ago

We can start storing .mpy and .py files on Amazon s3 somewhere like we do with all the circuitpython releases, but I'm not sure what are limits are and this may take up a bunch of room unless we just keep the latest versions of everything.

The CircuitPython builds on S3 are more than 3000 files for each merged pull requests or release. The bundles are tiny in comparison. I periodically clean out these files and I could do the same on the bundles. But maybe you just need to keep the latest unpacked bundle anyway?

makermelissa commented 3 years ago

Alright, cool. Storing the .py and .mpy files on S3 sounds like the best solution at this point then, but I'll allow others to chime in too.

dhalbert commented 3 years ago

And I think it would be fine to store them in the same bucket as the builds, so we don't need another set of credentials. We can just make a new top-level folder.

makermelissa commented 3 years ago

Ok, I was just diving into JSZip a bit and it looks like it can do extraction of zip content, so Amazon may not be necessary. I'm going to get something working without it first and if it looks like it's needed, we can do that. But it is nice to know that it is an option.

sommersoft commented 3 years ago

Why not just create the bundles the same way that the Bundle repo's .github/workflows/release.yml does, utilizing circuitpython-build-tools? Could probably work it into the current workflow file, and have them built on each release. Would only need to establish how to store/use the data for the specific bundles.

I would love to have a just-in-time service, but that would require some serious caching mechanisms to overcome the bandwidth issues with always requesting assets from GH.

dhalbert commented 3 years ago

I would love to have a just-in-time service, but that would require some serious caching mechanisms to overcome the bandwidth issues with always requesting assets from GH.

We would store the assets on S3, so I don't think it's too much of an issue. There could be caching behind that, but I think we could see if it's necessary. It could all be client-side JavaScript, just fetch multiple files from S3.

[OK, I didn't see @makermelissa's reply two up about using the zip's. But we could keep a copy of the zip's on S3 anyway.]

makermelissa commented 3 years ago

Why not just create the bundles the same way that the Bundle repo's .github/workflows/release.yml does, utilizing circuitpython-build-tools? Could probably work it into the current workflow file, and have them built on each release. Would only need to establish how to store/use the data for the specific bundles.

I think the main issue with doing pre-built bundles is there are just too many permutations to make it practical.

I would love to have a just-in-time service, but that would require some serious caching mechanisms to overcome the bandwidth issues with always requesting assets from GH.

Right, which is why I was looking at S3.

At this point, I'm thinking mpy and py zips on s3 which will help overcome the issue of module vs package and make JSON file generation simpler.

makermelissa commented 3 years ago

Ok I'm trying to figure out best infrastructure for S3. Right now CircuitPython uses GitHub actions, which works pretty well. It has the Amazon token as part of repo-specific secrets. Long term it would probably be best to have each repo upload its zip files to Amazon S3, but that would likely require another Actions sweep first and that would only upload the latest files.

Adabot has access as well and already has the secrets set up, so that looks pretty promising, but I didn't want to bog down the script too much if that's even really an issue. I suppose it could leverage circuitpython-build-tools to create the zip file, but it might be overkill.

I'm also thinking we can have adabot do an initial run of uploads and then uploading either new or updated libraries from that point on. We're already creating some json files for outstanding issues which are uploaded via actions, so I think it's just a matter of implementing something similar.

dhalbert commented 3 years ago

I think the bundle builder could just upload the files in the unpacked bundle. Otherwise every library repo has to know the S3 secret. Or it could be a webhook, like readthedocs, which does not require secrets. Nominally, that would require a server to answer the webhook, but you could use AWS Lambda to do this really easily, e.g. https://medium.com/mindorks/building-webhook-is-easy-using-aws-lambda-and-api-gateway-56f5e5c3a596 (many other examples can be found).

One advantage of doing it at bundle-building time is that a coordinated set of library changes can be done and then we wait for the once-a-day bundle builder to bundle to coordinated changes. The disadvantage is that the bundle builder only runs once a day so a user must wait for library fixes or download them by hand.

(Lambda example of unpacking an uploaded zip: https://medium.com/@johnpaulhayes/how-extract-a-huge-zip-file-in-an-amazon-s3-bucket-by-using-aws-lambda-and-python-e32c6cf58f06.)

makermelissa commented 3 years ago

I'm ok with having it only update once a day since we generally direct people to use the bundle anyways.

makermelissa commented 3 years ago

Ok, I have the first piece of this finished. I decided to put the JSON file building code into the circuitpython-build-tools when the bundle is built and it will be attached as an asset with the bundle.

One of the complication I was realizing that we would run into by having the files on Amazon S3 and the web files on GitHub would be CORS or Cross Origin Resource Sharing policies. This is a mechanism by the browser to help reduce Cross Site Scripting vulnerabilities and if we had the files on S3 and the JavaScript directly accessing those files, we would run into issues. If we have everything on GitHub, it should be much better, which is the plan. My plan is to have it access the latest bundle file, extract the contents it needs, and just return those rather than accessing separate repos. That should help eliminate the GitHub calls limitations, especially if it's being done through GitHub.

I am working on a demo that uses JSZip to access the zip files and will dynamically build and redirect the user to a smaller zip package. Once I can get that piece working, I should have enough things in place to get this done and I'm hoping to have a proof of concept working today, though JSZip is turning out to be a little less well documented than I thought, so it's a lot of trial and error.

jwcooper commented 3 years ago

This is great, and should work well for learn.adafruit.com as well.

Are you expecting the first bundle with the new JSON file to be built tonight?

makermelissa commented 3 years ago

Yes, it should be an asset for the bundle.

makermelissa commented 3 years ago

Demo is working! https://tranquil-massive-cilantro.glitch.me/ (random name generated by glitch) I have it downloading adafruit_ssd1305 and adafruit_ht16k33 by default, but you can specify a libs parameter. It will download any dependencies as well. It just needs a bit more polish, but this looks like it is totally doable using JavaScript. Currently the JSON and zip files reside on glitch so I didn't have to deal with CORS policies and I need to have it detect the latest bundle version still.

ladyada commented 3 years ago

yay for testing, booo for massive cilantro which sounds like my own personal nightmare (yuk!)

sommersoft commented 3 years ago

Demo is working!

Nice! Its quite speedy, to boot.

I have it downloading adafruit_ssd1305 and adafruit_ht16k33 by default

The ht16k33 folder was empty when I tried it a couple times, but probably a minor issue.

Currently the JSON and zip files reside on glitch so I didn't have to deal with CORS policies and I need to have it detect the latest bundle version still.

The libraries page JS might give you a head start on grabbing the latest version of the JSON file: https://github.com/adafruit/circuitpython-org/blob/master/assets/javascript/libraries.js

And while I love cilantro...tranquil, massive cilantro does sound ominous. 👻

makermelissa commented 3 years ago

The ht16k33 folder was empty when I tried it a couple times, but probably a minor issue.

Oh yeah, I just noticed mine is too. I'll look into this next.

The libraries page JS might give you a head start on grabbing the latest version of the JSON file: https://github.com/adafruit/circuitpython-org/blob/master/assets/javascript/libraries.js

Thanks

And while I love cilantro...tranquil, massive cilantro does sound ominous. 👻

Yeah, I should probably change the name... New URL is https://adafruit-dynamic-bundler.glitch.me/

makermelissa commented 3 years ago

Ok, the empty folder issue is fixed. It turns out a required JSZip function was not an asynchronous function, but rather used callbacks, so it basically added the files after the download initiated. Easy enough fix using promises.

makermelissa commented 3 years ago

I have been tackling the CORS issue. I set up a GitHub IO website with these pages hoping that would fix the issue, but is still was running into the same error since the domains are different.

I worked around it currently by having it using a public CORS proxy for now, so it's getting the zip and json data from GitHub successfully, so this is very close to being done. In terms of security, this is fine because everything is public already. However, I don't want to be reliant on somebody else's service, that could go down at any time, so I'm looking if there's a way to have our own CORS proxy or figuring out how to not need one at all.

Oh, and even with the use of a CORS proxy and accessing GitHub files remotely, it's still fast. :)

jwcooper commented 3 years ago

Back to considering S3, I haven't tested it, and not sure if it's already been tried, but could we configure the S3 bucket to allow GET from * using the AWS S3 CORS configuration?

makermelissa commented 3 years ago

I just tested a random file on S3 and get the CORS error still, but enabling the setting might work. I would need some AWS credentials setup on the circuitpython-build-tools repo to have it upload the bundle and json files there as well. Let me see if I can get this working first. I have a few tricks up my sleeves to try still.

jwcooper commented 3 years ago

I can setup your credentials for S3, or enable CORS on the S3 bucket (I haven't done this yet, just checked and they're blank), if we go that route.

makermelissa commented 3 years ago

Ok @jwcooper, let's go ahead and try using S3. It looks like none of my tricks are working.

makermelissa commented 3 years ago

Great news! Using S3 solves the CORS problem. For the moment, I only have it allowing a couple test sites I'm using and the Adafruit GitHub pages domain (where it will ultimately end up). We can add other domains as well or just open it wide up.

makermelissa commented 3 years ago

Done! See https://github.com/adafruit/Adafruit_Dynamic_Bundler. I added some parameters and examples to the Readme.