UpstreamDataInc / goosebit

A simplistic, opinionated remote update server implementing hawkBit™'s DDI API.
https://goosebit.rtfd.io
Apache License 2.0
18 stars 3 forks source link

API for CI integration #81

Closed easybe closed 2 months ago

easybe commented 2 months ago

I would like to be able to do the following through the API:

Upload firmware

I would expect to be able to POST to /api/v1/firmware and either provide a path or an URL to a file.

I guess something like this could work:

from fastapi import File, UploadFile, Form
(...)
async def firmware_post(
    file: Optional[UploadFile] = File(None),
    url: Optional[str] = Form(None)
):
    if file:
        # Process the uploaded file
    elif url:
        # Process the firmware URL
    else:
        # Return an error

Query firmware

Get the ID of a firmware entry by providing the URL/filename:

GET /api/v1/firmware?url=https://...
b-rowan commented 2 months ago

Would it make sense to have the file upload be a PUT method, and the url handler be a POST method? Not fully opposed to combining them, but if there is a relatively idiomatic way to separate them I would prefer that.

tsagadar commented 2 months ago

From a technical perspective:

Tricky part is how the system should work from a user perspective:

  1. Do we allow to create two file-based firmwares with the same checksum? Rather not, as it points towards a user error to have two firmwares with the same binary?
  2. Do we allow to create two URL-based firmwares with the same URL? If checksum is still the same, then we are in the case above. If the checksum now differs, then the file got replaced on the Web server and the old firmware is invalid (checksum and file no longer will match). So should it then create or update the firmware (see next bullet)?
  3. Should firmware objects be immutable? There IDs are used as action_ids to track what gets installed on the device. If changing firmware at a later stage would be allowed, this is no longer correct. Similar argument for rollouts: they captured statistics on success and failure of installing a firmware. If firmwares do change after a while, the stats become useless.

Proposal: only offer POST and not PUT/GET. POST returns 200 OK if firmware with same checksum already exists (maybe also include a message in the body that states this). Return a 201 Created if a new firmware got created - because file was never uploaded / URL leads to a checksum that does not exists.

Not sure if this really makes sense - should sleep over it. But posting this to invite for comments.

b-rowan commented 2 months ago

Proposal: only offer POST and not PUT/GET. POST returns 200 OK if firmware with same checksum already exists (maybe also include a message in the body that states this). Return a 201 Created if a new firmware got created - because file was never uploaded / URL leads to a checksum that does not exists.

I like this idea. My concern was, how do we separate local file uploads from remote file uploads in an idiomatic way? I think combine the endpoints is fine if needed, but I think it makes it much harder to read and understand.

I suppose I should phrase my question better - Is there a way we can separate these types of upload into 2 separate endpoints for readability?

b-rowan commented 2 months ago

Proposal: only offer POST and not PUT/GET. POST returns 200 OK if firmware with same checksum already exists (maybe also include a message in the body that states this). Return a 201 Created if a new firmware got created - because file was never uploaded / URL leads to a checksum that does not exists.

OK, after thinking about this some more, I think this proposal is correct, but seems like its missing some edge cases. Here are the situations that could happen and how I think we should deal with them (if moving forward in alignment with this proposal) -

tsagadar commented 2 months ago

Yes: overlooked the case of a filename clash. As filenames no longer have any meanings it would be nice if such clashes simply would not occur. Initial idea was to create a subfolder for each software - but it is a chicken/egg issue. The subfolder name ideally is the software id - but we only get this id, once the record is created pointing towards the proper file...

Maybe an ok solution: reference the temp file initially, create software, move file, update software?

easybe commented 2 months ago

Yes: overlooked the case of a filename clash. As filenames no longer have any meanings it would be nice if such clashes simply would not occur. Initial idea was to create a subfolder for each software - but it is a chicken/egg issue. The subfolder name ideally is the software id - but we only get this id, once the record is created pointing towards the proper file...

Maybe an ok solution: reference the temp file initially, create software, move file, update software?

Alternatively, we could use the first two characters of the SHA-1 for the directory. That is how hawkBit or other software like Artifactory or Git does it.

tsagadar commented 2 months ago

... reducing the collision likelihood by 256? What do I overlook?

easybe commented 2 months ago

... reducing the collision likelihood by 256? What do I overlook?

You are right, the filename also needs to be the rest of the hash.

b-rowan commented 2 months ago

I don't see why we wouldn't use more characters of the hash, or even the entire thing? We store it in the database anyway, so might as well just compute it initially, then store it with the hash.

3 choices here then -

  1. File name stays, but it goes into a folder /{hash},
  2. File name is changed to be something consistent (such as artifact.swu), the it goes into the same folder,
  3. File name just becomes the hash.

I prefer #2 personally, what do you guys think?

tsagadar commented 2 months ago

Number 3 seems the simplest approach - but I don't have a strong preference or see big advantages between one or the other.

However: likely it leads to an adjustment of the data model. We no longer want to store a file:// uri but rather the original file name (so that we can still show it to the user in the UI). Re-using the uri attribute for a file name is not very nice.

b-rowan commented 2 months ago

Number 3 seems the simplest approach - but I don't have a strong preference or see big advantages between one or the other.

However: likely it leads to an adjustment of the data model. We no longer want to store a file:// uri but rather the original file name (so that we can still show it to the user in the UI). Re-using the uri attribute for a file name is not very nice.

Issue here is we need to keep the uri anyway, since remote stuff needs it.

tsagadar commented 2 months ago

Yes, uri is needed for remote. So for local we have some options

  1. Store filename.swu in the uri
  2. Store file:///filename.swu - knowing it does not point to any local file directly
  3. Introduce an additional attribute original_file_name (or simiular)

Maybe 2. is ok.

b-rowan commented 2 months ago

Yes, uri is needed for remote. So for local we have some options

1. Store `filename.swu` in the uri

2. Store `file:///filename.swu` - knowing it does not point to any local file directly

3. Introduce an additional attribute original_file_name (or simiular)

Maybe 2. is ok.

  1. in this comment or above? I assume 2 here, but then I think it makes sense to use 1. from my comment. Then we drop filename, and just use the last part of the path as filename. So the idea would be -

Local files: file:///{hash}/foo.swu - filename foo.swu file:///{hash}/bar.swu - filename bar.swu

Remote files: https://my-server.io/some_path/foo.swu - filename foo.swu https://my-server.io/some_path/bar.swu - filename bar.swu

tsagadar commented 2 months ago

Sounds good.

b-rowan commented 2 months ago

Yes, uri is needed for remote. So for local we have some options

1. Store `filename.swu` in the uri

2. Store `file:///filename.swu` - knowing it does not point to any local file directly

3. Introduce an additional attribute original_file_name (or simiular)

Maybe 2. is ok.

2. in this comment or above?  I assume 2 here, but then I think it makes sense to use 1. from my comment.  Then we drop filename, and just use the last part of the path as filename.  So the idea would be -

Local files: file:///{hash}/foo.swu - filename foo.swu file:///{hash}/bar.swu - filename bar.swu

Remote files: https://my-server.io/some_path/foo.swu - filename foo.swu https://my-server.io/some_path/bar.swu - filename bar.swu

Will put this into a separate PR eventually.