airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

Format conversion in airr #698

Closed llrs closed 1 year ago

llrs commented 1 year ago

Hi I'm new to T cell receptors and Adaptive Immune Receptor Repertoire and I have some question (I'm not sure if they would be better in slack or here, but given the open nature of the software I prefer to ask them here).

I'm analyzing TCR for a project and this is my first time dealing with this data. At this project we use AdaptiveBiotech and we used some software to infer TCR from sequencing: TraCeR/mixcr.

Both methods do not provide the same ids for the TCR/rearrangements. I've found the software immunoseq2airr to convert between the output, and later I found airr (I mostly use R). I've seen some issues that the R package is not up to date with the python one #501 . Is there interest in adding format conversions to the package, or is immunarch, alakazam or other packages more suitable for this?

I also have missed the package when I searched in CRAN because it is not clear from the description what AIRR means and how it might be related to T cell receptors. Perhaps it would be better to mention TCR and/or spell the acronym so that other users can find it. I hope this helps.

schristley commented 1 year ago

Hi @llrs , the AIRR library doesn't do any format conversion, we encourage tools to support the AIRR format for either input or output.

llrs commented 1 year ago

I understand that the package doesn't do any format conversion. My question is if you would like to have it or not.

javh commented 1 year ago

HI @llrs,

Thanks for the suggestion. No, our scope is solely the data standards themselves, not the implementation of them per se nor conversion from other formats. Though, we do maintain the airr R and python reference implementations, their scope is pretty narrow.

Issue #501 should be stale at this point (we have some issue cleanup to do), but if there's something specific that's currently inconsistent between the python and R packages, please let us know and we'll look at it. A lot of the current differences are due to (a) more features in the python package (for various historical reasons) or (b) stylistic differences between the two languages.

And, yes, slack would be a more helpful place for analysis related questions.

Good point about the abbreviation. We can spell it out in the R package description.

PS: mixcr does support the AIRR standard as of v4.0.