Closed initoby closed 5 years ago
Thanks Ini. All – This means no data from Adaptive Biotech can be submitted. Perhaps that’s fine since they have their own repository, but then I guess that would need to be mentioned in the standard.
Thank you. Yes, we should mention that this could be a problem. We just have to decide if we specifically say in the manuscript that it can’t be used for adaptive reads or more neutral only for sequences with more than 200bp and only be more specific in the detailed documentation.
Florian
Florian Rubelt, Dr. Mark M. Davis Laboratory Howard Hughes Medical Institute Stanford University School of Medicine
On Oct 13, 2017, at 2:31 PM, lgcowell notifications@github.com<mailto:notifications@github.com> wrote:
Thanks Ini. All – This means no data from Adaptive Biotech can be submitted. Perhaps that’s fine since they have their own repository, but then I guess that would need to be mentioned in the standard.
— —————————————— Lindsay G. Cowell, PhD Associate Professor Division of Biomedical Informatics Department of Clinical Sciences University of Texas Southwestern Medical Center Lindsay.Cowell@utsouthwestern.edumailto:Lindsay.Cowell@utsouthwestern.edumailto:Lindsay.Cowell@utsouthwestern.edu 214-648-2289
Administrative Assistant: Mack Dressler Mack.Dressler@UTSouthwestern.edumailto:Mack.Dressler@UTSouthwestern.edumailto:Mack.Dressler@UTSouthwestern.edu 214-648-2558
From: initoby notifications@github.com<mailto:notifications@github.com> Reply-To: airr-community/airr-standards reply@reply.github.com<mailto:reply@reply.github.com> Date: Friday, October 13, 2017 at 3:55 PM To: airr-community/airr-standards airr-standards@noreply.github.com<mailto:airr-standards@noreply.github.com> Cc: Subscribed subscribed@noreply.github.com<mailto:subscribed@noreply.github.com> Subject: [airr-community/airr-standards] Sequences less than 200bp not accepted by GenBank for AIRR submission (#26)
Notes below about this issue are from Lori Black @ NCBI GenBank sent in an email response about some of the sequences in one of the fasta files I had submitted for the AIRR standards.
[2] Many of the sequence(s) in your file(s) are less than 200 bp.
Unfortunately, we must inform you that we have a policy not to accept sequences shorter than 200 bp. We realize that this has short-term consequences for submitters, but feel that the long-term improvements in the database will be helpful for all database users.
If you resubmit your sequence submission(s) with additional sequence, we may then be able to accept your sequence(s). Alternatively, if you would like us to delete the sequence(s) that are under 200 bp and proceed with the rest, please inform us.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/airr-community/airr-standards/issues/26, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AF4uhdMvG-4i6oFMdfN-ob2q3z8lmd-_ks5sr85TgaJpZM4P5CwW.
UT Southwestern
Medical Center
The future of medicine, today.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/airr-community/airr-standards/issues/26#issuecomment-336572501, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AefwkLuyhwYaZxlITAjZmgdyjxB2kEaCks5sr9a2gaJpZM4P5CwW.
Hy Ini. Thanks, that's clearly a limitation we were not aware of. What kind of reads did you try to submit?
Although I can't speak to how important this limitation of GenBank is, it does speak to the importance of ensuring that we don't "bias" the terminology in the standard toward the NCBI repositories. I know that there has been a lot of work around mapping the standard to NCBI, but we should be careful around the wording in MiAIRR such that we do not make it to NCBI focussed...
We have had a couple of internal discussions around our curation process, and one of the comments was that the descriptions of some of the MiAIRR fields explicitly says it requires an NCBI identifier when not all studies end up in NCBI.
For example: "1 / study Study String Alphanumeric UID assigned by NCBI"
This should probably be changed to something like:
"Unique alphanumeric ID for the study, typically the UID assigned by an international repository such as NCBI"
excellent point.
The latter issue has been changed to "Unique ID assigned by study registry" in 8f63c1e8dde61ba771bc062f1fa74f74c061ac25. This commit also includes other changes that should make the content definitions independent of NCBI.
Notes below about this issue are from Lori Black @ NCBI GenBank sent in an email response about some of the sequences in one of the fasta files I had submitted for the AIRR standards.
[2] Many of the sequence(s) in your file(s) are less than 200 bp.
Unfortunately, we must inform you that we have a policy not to accept sequences shorter than 200 bp. We realize that this has short-term consequences for submitters, but feel that the long-term improvements in the database will be helpful for all database users.
If you resubmit your sequence submission(s) with additional sequence, we may then be able to accept your sequence(s). Alternatively, if you would like us to delete the sequence(s) that are under 200 bp and proceed with the rest, please inform us.