Open tompollard opened 2 years ago
Hi @tompollard , I had one concern about this.
Wouldn't changing the names of zipfile mess up the existing zips that were already created?
After checking the codebase, my understanding is that there are several parts of code that initiate zip of project files(which uses the zip_name
function and create zip of the project files, and there are other part in code that call this zip_name
function again when someone tries to retrieve the zip file for the project.
So that means when we update this function, the retrieval of zip will break for a lot of existing projects.
I assume to solve that, we need to retrigger the zip of project files, and that might take a lot of resources depending on the number of projects and the file size.. Plus i am not sure if we keep a entry in database, if the project files have been zipped
Wouldn't changing the names of zipfile mess up the existing zips that were already created?
Yes, this is true. Part of the work would be fixing the current filenames (on our servers, and on the cloud servers). I don't think this would be a huge effort, and if we think it is necessary then it would be better to do sooner than later.
Good point. Do you think somekind of management command might be a good idea to do that? If feels weird to have a management command for a one off thing. (but i can't think of anything else right now)
any suggestions?
Good point. Do you think somekind of management command might be a good idea to do that? If feels weird to have a management command for a one off thing. (but i can't think of anything else right now). any suggestions?
@bemoody probably has thoughts on this, but I think a management command for the migration would be fine (and it could always be removed later).
Could be a data migration (RunPython) that creates a hard link if the old zip file already exists. Or we could just do it by hand. Don't want to delete the old files (at least not immediately) because clients are in the process of downloading them.
Note this problem just came up again (vindr-cxr 1.0.0).
It's uglier than I thought. The title isn't just used to name the zip file, it's also used as the path inside the zip file ("enclosing_folder", see LocalProjectFiles.make_zip).
So maybe better to regenerate the zip files as well as renaming them.
https://github.com/MIT-LCP/physionet-build/pull/2134 updates zip_name
to use more sensible filenames, but includes a legacy
argument that defaults to the old logic. Can we now:
legacy=False
(or remove it altogether?)try: new_zipfile; except legacy_zipfile
)?
We generate a zipfile containing all files within a project. At the moment, it looks like the filename of this zipfile is based on the project title. It would make much more sense for the filename to be based on the project slug, I think.
The slug is unique, it doesn't have characters like spaces, apostrophes, etc, and it is short. Titles are much less suited to a filename. The function that needs a rethink is copied below:
https://github.com/MIT-LCP/physionet-build/blob/141d2a9262e6b9191e3bf81f3ccee9485c43e9cf/physionet-django/project/modelcomponents/publishedproject.py#L112-L126