HumanCellAtlas / metadata-schema

This repo is for the metadata schemas associated with the HCA
Apache License 2.0
65 stars 32 forks source link

Add field for 10x v3 feature barcodes #1125

Open samanehsan opened 5 years ago

samanehsan commented 5 years ago

For which schema is a change/update being suggested?

I'm not sure which schema would be appropriate here

What should the change/update be?

I would like to add a new field that indicates whether feature barcodes were used for a 10x v3 sequencing assay.

Why is the change requested?

Whether or not a 10x v3 sequencing experiments uses feature barcoding will impact how the data is analyzed, so that information needs to be captured in the metadata.

lauraclarke commented 5 years ago

@samanehsan do you think that there will be general processing that is uniform across different types of feature barcode use or if different assay types will have different solutions.

Are there assays where feature barcodes might or might not be used?

I see the goal of ensuring that we can know when feature barcodes will be used. I am trying to understand if this needs to be encoded independently of other assay labels.

@willrockout I know you have looked at citeseq and have plans for that, how would this work with what you are proposing?

samanehsan commented 5 years ago

@lauraclarke, the @HumanCellAtlas/pipelines-computational-biologists will be able to answer your questions about feature barcoding!

kbergin commented 5 years ago

My limited understanding of feature barcoding is that there are many ways to incorporate and use them, and I think they all impact analysis choices. We plan to do a spike on feature barcoding pipelines next Q and would have more info then. But hopefully someone tagged in Saman's comment can provide more insight.

barkasn commented 5 years ago

In the short-term we aim to support V3 without feature barcodes, so we need to make sure that we can differentiate datasets with and with out them so a boolean flag is appropriate. I have not looked into the V3 feature barcoding sufficiently to be able to say if further metadata information will be required to be provided in the case where feature barcodes are enabled. A spike is required.

lauraclarke commented 5 years ago

This sounds like it would be a good topic for a metadata call.

If a general feature barcoding tag is useful I am very happy to add it but I am trying to understand if actually the majority of uses of feature barcoding will be handled differently then specific labels for the specific type of feature barcoding might be a better solution than a blank boolean field

lauraclarke commented 5 years ago

As a specific example right now, you avoid analysing the cite-seq dataset as that is the label in the library construction approach

It might be better to make sure that specific library construction approaches which use a particular feature barcoding strategy that needs a specialist pipeline explicitly list it in library construction on elsewhere rather than with a binary flag and defaulting to library construction 10x v3 because a 10x v3 machine was used

barkasn commented 5 years ago

If I understand you correctly, I agree that having the presence or absence of feature barcodes (FBs) in the library construction tag might be a better approach.

I don't have a preference as long as we can clearly differentiate the libraries that don't use Feature Barcoding. One thing I would urge however, if we go down this path, is to make sure that we avoid a generic tag like "V3" as it will be unclear what this is and users will select it with different assumptions (i.e. some "V3" tagged datasets with have FB and some not). We want something like "V3 no feature barcodes" and "V3 with feature barcodes" or something along those lines.

Hope this is clear, happy to jump into a call if clarification is required.

Also wanted to note that 10X V3 chemistry does not required (to the best of my knowledge) a different instrument version, just a different kit.

hewgreen commented 5 years ago

Could the current barcodes module that be utilised? https://github.com/HumanCellAtlas/metadata-schema/blob/master/json_schema/module/process/sequencing/barcode.json

lauraclarke commented 5 years ago

Could the current barcodes module that be utilised? https://github.com/HumanCellAtlas/metadata-schema/blob/master/json_schema/module/process/sequencing/barcode.json

I wasn't on the call but I think that was what @willrockout was thinking about for cite-seq

So perhaps the presence of one of those fields could be used as a definitive declaration of presence or absence of barcodes

barkasn commented 5 years ago

I don't know about the existing representation of the barcodes and I would like to learn more. However features barcodes are very different in their use to library barcodes or cell barcodes so would generally not expect the same representation to work unless very general.

lauraclarke commented 5 years ago

This module is actually for imaging barcodes

hewgreen commented 5 years ago

@lauraclarke no it's for sequencing barcodes. It's the metadata you need to locate the barcodes. As Nick says currently used for cell and umi barcodes. Basically three fields: barcode_read barcode_offset barcode_length

so using this would give you:

feature.barcode_read feature.barcode_offset feature.barcode_length

Presence/absence of feature.barcode_read in addition to library construction 10x v3 would tell you if the pipeline needs to look for barcodes. Then the other two tell you where to look (although this would be standard for 10x I guess)