This fixes a corner case where older studies that used the sync feature may have created DirectoryListing objects for files in the _scp_internal folder. Even if they didn't saved them, the objects are still persisted in the database (there's a sync_status flag for determining if they used for downloads or not). Then, when issuing a single-study bulk download call for that study, it will attempt to load those files, resulting in errors if the files have been removed.
Now, only synced directories are allowed. Previous work in #1598 already ignores these files in _scp_internal, but at least one study on production had already created DirectoryListing entry before this was merged.
MANUAL TESTING
Enter a Rails console session and load any public, non-detached study that has files:
study = Study.where(detached: false, public: true).select {|s| s.study_files.any? }.sample
Add an unsynced DirectoryListing:
files = 1.upto(30).map do |i|
{
name: "_scp_internal/subdir/output_#{i}.tsv",
size: i * 100,
generation: SecureRandom.random_number(10000..99999)
}
end
=>
[{:name=>"_scp_internal/subdir/output_1.tsv", :size=>100, :generation=>52683},
...
3. In a browser, sign in with a **non-admin account** and load the above study
4. Click the bulk download button and issue the request
5. In `development.log` you should see a similar entry with no errors about missing files:
Started GET "/single_cell/api/v1/bulk_download/generate_curl_config?accessions=SCP105&auth_code=nk04LPnr&directory=all&context=study" for 127.0.0.1 at 2024-10-02 13:16:28 -0400
Processing by Api::V1::BulkDownloadController#generate_curl_config as JSON
Parameters: {"accessions"=>"SCP105", "auth_code"=>"nk04LPnr", "directory"=>"all", "context"=>"study"}
Authenticating user via auth_token: nk04LPnr
FireCloud API request (GET) https://api.firecloud.org/api/groups with tracking identifier: 60903e40e24139898f9729d0
Adding 7795450328 bytes to user: 60903e40e24139898f9729d0 download quota for bulk download
Beginning creation of curl configuration for user_id, auth token: 60903e40e24139898f9729d0
2024-10-02 13:16:33 -0400: Logging analytics to Mixpanel for event name: file-download:curl-config
2024-10-02 13:16:33 -0400: Posting to Mixpanel. Params: {:url=>"https://terra-bard-dev.appspot.com/api/event", :headers=>{"Content-Type"=>"application/json", "Authorization"=>"Bearer [REDACTED]"}, :payload=>"{\"event\":\"file-download:curl-config\",\"properties\":{\"numFiles\":8,\"numMetadataFiles\":1,\"numExpressionFiles\":2,\"numClusterFiles\":1,\"numStudies\":1,\"studyAccessions\":[\"SCP105\"],\"numAzulStudies\":0,\"azulAccessions\":[],\"numAzulFiles\":0,\"numAzulAnalysisFiles\":0,\"numAzulSequenceFiles\":0,\"context\":\"study\",\"os\":\"Unknown\",\"appId\":\"single-cell-portal\",\"env\":\"development\",\"logger\":\"app-backend\",\"authenticated\":true,\"registeredForTerra\":true,\"abTests\":[]}}", :method=>"POST"}
2024-10-02 13:16:34 -0400: Bard error in call to https://terra-bard-dev.appspot.com/api/event: 503 Service Unavailable
Curl configs generated for studies ["SCP105"], 8 total files
Total time in generating curl configuration: 2 Seconds
Rendering text template
Rendered text template (Duration: 0.2ms | Allocations: 29)
Sent data cfg.txt (10.6ms)
Completed 200 OK in 5502ms (Views: 9.9ms | MongoDB: 3.1ms | Allocations: 251821)
BACKGROUND & CHANGES
This fixes a corner case where older studies that used the sync feature may have created
DirectoryListing
objects for files in the_scp_internal
folder. Even if they didn't saved them, the objects are still persisted in the database (there's async_status
flag for determining if they used for downloads or not). Then, when issuing a single-study bulk download call for that study, it will attempt to load those files, resulting in errors if the files have been removed.Now, only synced directories are allowed. Previous work in #1598 already ignores these files in
_scp_internal
, but at least one study on production had already createdDirectoryListing
entry before this was merged.MANUAL TESTING
DirectoryListing
:directory = DirectoryListing.create(study:, file_type: 'tsv', name: '_scp_internal', files:, sync_status: false) => #<DirectoryListing _id: 66fd7c8094ec8f2cdeedd47e, created_at: 2024-10-02 17:01:52.848284 UTC, updated_at: 2024-10-02 17:01:52.848284...
Started GET "/single_cell/api/v1/bulk_download/generate_curl_config?accessions=SCP105&auth_code=nk04LPnr&directory=all&context=study" for 127.0.0.1 at 2024-10-02 13:16:28 -0400 Processing by Api::V1::BulkDownloadController#generate_curl_config as JSON Parameters: {"accessions"=>"SCP105", "auth_code"=>"nk04LPnr", "directory"=>"all", "context"=>"study"} Authenticating user via auth_token: nk04LPnr FireCloud API request (GET) https://api.firecloud.org/api/groups with tracking identifier: 60903e40e24139898f9729d0 Adding 7795450328 bytes to user: 60903e40e24139898f9729d0 download quota for bulk download Beginning creation of curl configuration for user_id, auth token: 60903e40e24139898f9729d0 2024-10-02 13:16:33 -0400: Logging analytics to Mixpanel for event name: file-download:curl-config 2024-10-02 13:16:33 -0400: Posting to Mixpanel. Params: {:url=>"https://terra-bard-dev.appspot.com/api/event", :headers=>{"Content-Type"=>"application/json", "Authorization"=>"Bearer [REDACTED]"}, :payload=>"{\"event\":\"file-download:curl-config\",\"properties\":{\"numFiles\":8,\"numMetadataFiles\":1,\"numExpressionFiles\":2,\"numClusterFiles\":1,\"numStudies\":1,\"studyAccessions\":[\"SCP105\"],\"numAzulStudies\":0,\"azulAccessions\":[],\"numAzulFiles\":0,\"numAzulAnalysisFiles\":0,\"numAzulSequenceFiles\":0,\"context\":\"study\",\"os\":\"Unknown\",\"appId\":\"single-cell-portal\",\"env\":\"development\",\"logger\":\"app-backend\",\"authenticated\":true,\"registeredForTerra\":true,\"abTests\":[]}}", :method=>"POST"} 2024-10-02 13:16:34 -0400: Bard error in call to https://terra-bard-dev.appspot.com/api/event: 503 Service Unavailable Curl configs generated for studies ["SCP105"], 8 total files Total time in generating curl configuration: 2 Seconds Rendering text template Rendered text template (Duration: 0.2ms | Allocations: 29) Sent data cfg.txt (10.6ms) Completed 200 OK in 5502ms (Views: 9.9ms | MongoDB: 3.1ms | Allocations: 251821)