airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

pcr_target_locus VS cell_subset - general question #413

Closed bcorrie closed 3 years ago

bcorrie commented 4 years ago

A metadata clarification question - from a UI/display perspective on the iReceptor Gateway...

If one cell sorts into say different types of B cells, and therefore has a cell_subset, would one always, typically, or never have a pcr_target_locus? I suspect the answer is sometimes, but am wondering if it would the answer would be yes more often than not...

Putting the question a different way, if you were a user sitting in front of the iReceptor Gateway, looking at data from 2000 repertoires and 30 studies, and you had to choose would you rather see summary stats about pcr_target_locus or cell_subset?

bussec commented 4 years ago

cell_subset (because I already put a filter on keywords_study==contains_ig and pcr_target_locus does not offer that much more information).

The main reason not to have a pcr_target_locus would be techniques in which you do not perform targeted amplification. This is probably not the majority of studies, but everything that does RNA-seq would fall into this category, and that might increase in the future (especially for single-cell).

bcorrie commented 4 years ago

@bussec - curve ball...

keywords_study is a study level filter. So if a study contains both IG and TCR would you expect that filter to provide the differentiation that you are after for a specific repertoire. For a given repertoire R, and R is a repertoire that belongs to a study that has both IG and TCR, what does keywords_study contain? 8-)

bussec commented 4 years ago

For a given repertoire R, and R is a repertoire that belongs to a study that has both IG and TCR, what does keywords_study contain?

Point taken, of course it contains both. Nevertheless the resolution that pcr_target_locus will give you is a basically "B vs. T" and this is where cell_subset offers more depth (your original post mentioned "different types of B cells", so I thought this could be valuable).

The downside of using cell_subset is of course that you might end up with a rather fragmented display, due to the numerous populations. To avoid this, you could bin them initially at the three top nodes (B, alpha:beta T, gamma:delta T) and then allow the user to "zoom" in. But it is easy for me to say this, as I don't have to implement it into the UI ;-)

bcorrie commented 3 years ago

@bussec @schristley the question I asked above about the UI aspect is resolved, any object to me closing this? If so reopen 8-)