Sage-Bionetworks / data_curator

Data and metadata ingress app
Apache License 2.0
10 stars 21 forks source link

New files added to a folder do not show up in the generated templates #193

Closed ychae closed 3 years ago

ychae commented 3 years ago

Describe the bug From Marisol R at Vanderbilt:

I can generate the metadata spreadsheet now, but the spreadsheet only shows the older files, not the ones uploaded on 08/2021. The recently uploaded file names should be in the metadata spreadsheet for us to fill out, correct?

One of the folders that contains the old (February) files and new (August) files are here: https://www.synapse.org/#!Synapse:syn23564801

To Reproduce Steps to reproduce the behavior:

  1. Go to Data Curator App
  2. Select "Project":"HTAN Vanderbilt", "Folder":"single_cell_RNAseq_level_1", "Template":"scRNA-seq Level 1"
  3. Click the three arrows to go to the generate template page
  4. Click the purple "Click to Generate Google Sheets Template" button
  5. Click on the link it generates
  6. Template doesn't contain the newest files uploaded in August

Expected behavior The generated template should contain both the older and newer files that were uploaded for centers to annotate.

Additional Notes Ideally have this fix in place by early October 2021

milen-sage commented 3 years ago

schematic should handle the generation of a google sheet when there is no manifest present and when there is a manifest present in a given dataset folder. schematic will return a url to the resulting google sheet. The DCA should only need to call schematic's populate manifest function, w/o having logic to decide whether an empty or pre-populated google sheet has to be generated.

milen-sage commented 3 years ago

@rrchai you can take a look at the updated PR 523 in schematic.

Instead of calling populateModelManifest, you should be able to call getModelManifest the same way as you do here.

The new files should be appended at the end of the manifest. Currently all fields other than filename and entityId are left blank (i.e. empty string).

To test in schematic for the dataset referenced above in this issue: schematic manifest -v INFO --config ./config-htan.yml get --data_type ScRNA-seqLevel1 --title scRNA-seqLevel1 -d syn23520241 --oauth --sheet_url

Output here: https://docs.google.com/spreadsheets/d/19BH8yUc8AtmVVPaYajdwUDMkVPOR-jmuNPm8WGP4TVs/edit#gid=0

milen-sage commented 3 years ago

@ychae thank you for the detailed bug description across these issues (e.g. #190, #192 and #193) :)

ychae commented 3 years ago

@milen-sage @rong I just tested and I see a lot blank rows and then once I scroll down I see the old file names. If I scroll over to the right though, I can see all of the SynapseIDs.

Blank file names to filled in old file names:

blank_filenames

Filled in columns, including Synapse IDs for all files:

synapseIds
milen-sage commented 3 years ago

@ychae I see the biospecimen component there. Aren't these files single cell? If the files are single cell, could you try selecting single cell RNA seq in the template dropdown?

Biospecimen is not a file-based data type, and the expected behavior is different wrt to pulling updated list of files (i.e. files do not apply to Biospecimen data; e.g. the data model doesn't include the attribute Filename for Biospecimen).

ychae commented 3 years ago

@milen-sage it's really odd, but I selected the correct scRNA-seq Level 1 template and that's what it generated. Here's the screenshot of the dropdown values

Screen Shot 2021-10-01 at 12 51 27 PM

:

milen-sage commented 3 years ago

@rrchai has accidentally submitted a Biospecimen manifest for the Vanderbilt folder while testing today. I restored Vanderbilt's manifest. @ychae could you test the app again?

ychae commented 3 years ago

Whoo-hoo! Looks good! Love it when it's an easy fix 😁

I can see the annotated old files and the filenames of the new files, with no annotations. Looks good to share with Marisol.

Thanks Rong! πŸŽ‰ Thanks Milen! πŸ™Œ

rrchai commented 3 years ago

@ychae Good catch!! And thank you @milen-sage for a quick fix.

I realized when I was testing submission function, I accidentally submitted some files from other project to the Vanderbilt scRNA_seq_level1. If I remember correctly, it only originally contains synapse_storage_manifest.csv and FASTQ_raw.

@milen-sage Is it possible to restore the folder/delete other extra files?

Sorry about the mistake!

milen-sage commented 3 years ago

Yes, we can delete the unneeded entities.

rrchai commented 3 years ago

closes the issue since it's been fixed. Feel free to reopen if the issue remains