MIT-LCP / physionet-build

The new PhysioNet platform.
https://physionet.org/
BSD 3-Clause "New" or "Revised" License
56 stars 20 forks source link

Use project slug for generating name of zipfiles, instead of project title #1674

Open tompollard opened 2 years ago

tompollard commented 2 years ago

We generate a zipfile containing all files within a project. At the moment, it looks like the filename of this zipfile is based on the project title. It would make much more sense for the filename to be based on the project slug, I think.

The slug is unique, it doesn't have characters like spaces, apostrophes, etc, and it is short. Titles are much less suited to a filename. The function that needs a rethink is copied below:

https://github.com/MIT-LCP/physionet-build/blob/141d2a9262e6b9191e3bf81f3ccee9485c43e9cf/physionet-django/project/modelcomponents/publishedproject.py#L112-L126

amitupreti commented 1 year ago

Hi @tompollard , I had one concern about this.

Wouldn't changing the names of zipfile mess up the existing zips that were already created?

After checking the codebase, my understanding is that there are several parts of code that initiate zip of project files(which uses the zip_name function and create zip of the project files, and there are other part in code that call this zip_name function again when someone tries to retrieve the zip file for the project.

So that means when we update this function, the retrieval of zip will break for a lot of existing projects.

I assume to solve that, we need to retrigger the zip of project files, and that might take a lot of resources depending on the number of projects and the file size.. Plus i am not sure if we keep a entry in database, if the project files have been zipped

tompollard commented 1 year ago

Wouldn't changing the names of zipfile mess up the existing zips that were already created?

Yes, this is true. Part of the work would be fixing the current filenames (on our servers, and on the cloud servers). I don't think this would be a huge effort, and if we think it is necessary then it would be better to do sooner than later.

amitupreti commented 1 year ago

Good point. Do you think somekind of management command might be a good idea to do that? If feels weird to have a management command for a one off thing. (but i can't think of anything else right now)

any suggestions?

tompollard commented 1 year ago

Good point. Do you think somekind of management command might be a good idea to do that? If feels weird to have a management command for a one off thing. (but i can't think of anything else right now). any suggestions?

@bemoody probably has thoughts on this, but I think a management command for the migration would be fine (and it could always be removed later).

bemoody commented 1 year ago

Could be a data migration (RunPython) that creates a hard link if the old zip file already exists. Or we could just do it by hand. Don't want to delete the old files (at least not immediately) because clients are in the process of downloading them.

bemoody commented 1 year ago

Note this problem just came up again (vindr-cxr 1.0.0).

bemoody commented 1 year ago

It's uglier than I thought. The title isn't just used to name the zip file, it's also used as the path inside the zip file ("enclosing_folder", see LocalProjectFiles.make_zip).

So maybe better to regenerate the zip files as well as renaming them.

tompollard commented 11 months ago

https://github.com/MIT-LCP/physionet-build/pull/2134 updates zip_name to use more sensible filenames, but includes a legacy argument that defaults to the old logic. Can we now: