broadinstitute / single_cell_portal_core

Rails/Docker application for the Broad Institute's single cell RNA-seq data portal
https://singlecell.broadinstitute.org
BSD 3-Clause "New" or "Revised" License
62 stars 26 forks source link

Ignore unsynced directories in bulk download (SCP-4549) #2149

Closed bistline closed 2 weeks ago

bistline commented 3 weeks ago

BACKGROUND & CHANGES

This fixes a corner case where older studies that used the sync feature may have created DirectoryListing objects for files in the _scp_internal folder. Even if they didn't saved them, the objects are still persisted in the database (there's a sync_status flag for determining if they used for downloads or not). Then, when issuing a single-study bulk download call for that study, it will attempt to load those files, resulting in errors if the files have been removed.

Now, only synced directories are allowed. Previous work in #1598 already ignores these files in _scp_internal, but at least one study on production had already created DirectoryListing entry before this was merged.

MANUAL TESTING

  1. Enter a Rails console session and load any public, non-detached study that has files:
    study = Study.where(detached: false, public: true).select {|s| s.study_files.any? }.sample
  2. Add an unsynced DirectoryListing:
    
    files = 1.upto(30).map do |i|
    {
    name: "_scp_internal/subdir/output_#{i}.tsv",
    size: i * 100,
    generation: SecureRandom.random_number(10000..99999)
    }
    end
    => 
    [{:name=>"_scp_internal/subdir/output_1.tsv", :size=>100, :generation=>52683},
    ...

directory = DirectoryListing.create(study:, file_type: 'tsv', name: '_scp_internal', files:, sync_status: false) => #<DirectoryListing _id: 66fd7c8094ec8f2cdeedd47e, created_at: 2024-10-02 17:01:52.848284 UTC, updated_at: 2024-10-02 17:01:52.848284...

3. In a browser, sign in with a **non-admin account** and load the above study
4. Click the bulk download button and issue the request
5. In `development.log` you should see a similar entry with no errors about missing files:

Started GET "/single_cell/api/v1/bulk_download/generate_curl_config?accessions=SCP105&auth_code=nk04LPnr&directory=all&context=study" for 127.0.0.1 at 2024-10-02 13:16:28 -0400 Processing by Api::V1::BulkDownloadController#generate_curl_config as JSON Parameters: {"accessions"=>"SCP105", "auth_code"=>"nk04LPnr", "directory"=>"all", "context"=>"study"} Authenticating user via auth_token: nk04LPnr FireCloud API request (GET) https://api.firecloud.org/api/groups with tracking identifier: 60903e40e24139898f9729d0 Adding 7795450328 bytes to user: 60903e40e24139898f9729d0 download quota for bulk download Beginning creation of curl configuration for user_id, auth token: 60903e40e24139898f9729d0 2024-10-02 13:16:33 -0400: Logging analytics to Mixpanel for event name: file-download:curl-config 2024-10-02 13:16:33 -0400: Posting to Mixpanel. Params: {:url=>"https://terra-bard-dev.appspot.com/api/event", :headers=>{"Content-Type"=>"application/json", "Authorization"=>"Bearer [REDACTED]"}, :payload=>"{\"event\":\"file-download:curl-config\",\"properties\":{\"numFiles\":8,\"numMetadataFiles\":1,\"numExpressionFiles\":2,\"numClusterFiles\":1,\"numStudies\":1,\"studyAccessions\":[\"SCP105\"],\"numAzulStudies\":0,\"azulAccessions\":[],\"numAzulFiles\":0,\"numAzulAnalysisFiles\":0,\"numAzulSequenceFiles\":0,\"context\":\"study\",\"os\":\"Unknown\",\"appId\":\"single-cell-portal\",\"env\":\"development\",\"logger\":\"app-backend\",\"authenticated\":true,\"registeredForTerra\":true,\"abTests\":[]}}", :method=>"POST"} 2024-10-02 13:16:34 -0400: Bard error in call to https://terra-bard-dev.appspot.com/api/event: 503 Service Unavailable Curl configs generated for studies ["SCP105"], 8 total files Total time in generating curl configuration: 2 Seconds Rendering text template Rendered text template (Duration: 0.2ms | Allocations: 29) Sent data cfg.txt (10.6ms) Completed 200 OK in 5502ms (Views: 9.9ms | MongoDB: 3.1ms | Allocations: 251821)

codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 69.54%. Comparing base (d3059c8) to head (4f6f387). Report is 31 commits behind head on development.

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2149/graphs/tree.svg?width=650&height=150&src=pr&token=HMWE5BO2a4&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute)](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2149?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute) ```diff @@ Coverage Diff @@ ## development #2149 +/- ## =============================================== - Coverage 69.56% 69.54% -0.02% =============================================== Files 331 331 Lines 27948 27948 Branches 2440 2440 =============================================== - Hits 19441 19436 -5 - Misses 8353 8358 +5 Partials 154 154 ``` | [Files with missing lines](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2149?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute) | Coverage Δ | | |---|---|---| | [app/controllers/api/v1/bulk\_download\_controller.rb](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2149?src=pr&el=tree&filepath=app%2Fcontrollers%2Fapi%2Fv1%2Fbulk_download_controller.rb&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute#diff-YXBwL2NvbnRyb2xsZXJzL2FwaS92MS9idWxrX2Rvd25sb2FkX2NvbnRyb2xsZXIucmI=) | `94.69% <100.00%> (ø)` | | ... and [2 files with indirect coverage changes](https://app.codecov.io/gh/broadinstitute/single_cell_portal_core/pull/2149/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=broadinstitute)