ag-computational-bio / bakrep-web

The user interface for bakrep
1 stars 0 forks source link

Add genome sequences to the repository #55

Closed lukasjelonek closed 10 months ago

lukasjelonek commented 11 months ago

During the first initialization of the repository, the genome fasta files where accidentally left out. They need to be uploaded as well. A good moment to include them would be when the metadata will be included, see #51

lukasjelonek commented 11 months ago

External disussions revealed that the genome fasta files are not part of the workflow results. Nevertheless this data should be available for download.

I see multiple options:

The extraction and generation options will require us to implement the functionality in the website and in the download tool. The reference to the original source will require the possibility to add links to the datasets on the server side. The dataset is already designed to contain arbitrary links, but at the moment it is not possible to create them via the api or the upload tool. So this feature must be implemented.

I prefer the last option, although it may be require more programming on the server side, it should reduce the overall amount of work of either rerunning the whole analysis + upload or implementing extraction of data at two places.

lukasjelonek commented 11 months ago

Tasks

lukasjelonek commented 11 months ago

The dataset -> assembly-url-Mapping is available in <projectvolume>/upload/assemblies/assembly-urls.tsv

As we have all assemblies downloaded, it should be possible to compute the md5 sums and sizes on the local copies.

lukasjelonek commented 10 months ago

The implementation part is completed. Now the data has to be further processed and uploaded.

lukasjelonek commented 10 months ago

The genome sequences have been uploaded, but unfortunately with http urls instead of https urls. This results in security warnings when trying to download the file in the browser. The ENA ftp site is also available via https, so it should be sufficient to reupload all assembly-urls with the https scheme.

TODO