brineylab / abstar

VDJ assignment and antibody sequence annotation. Scalable from a single sequence to billions of sequences.
MIT License
40 stars 17 forks source link

problem to getting output for VDJ assignment #112

Open saramoein372 opened 2 years ago

saramoein372 commented 2 years ago

Hello, I have a fasta file with below format, and like to have VDJ assignment based the reference (IGMT or IGBLAST) for each of the sequence. After running the abstar on my fasta file, no output generated.

Would you please guide me what command I should run to get the VDJ assignment based on each sequence?

This id some line of fasta file:

A00814:550:HYJTNDSX2:4:1301:32931:2159 -1 3613904 3898754 barcode:CATGACAGTTCGGCAC umi:1158 GTCCCAGGTCACCATCACCGGCTCCGGGAAGTAGCCCGTGGCCAGGCAGCCCAGAGTCACGGAGGTGGCATTGGAGGGAATG A00814:550:HYJTNDSX2:4:2208:32949:8954 -1 3590858 3897376 barcode:AAATGCCTCCAAACAC umi:29477 GTCCCAGGTCACCATCACCGGCTCCGGGAAGTAGCCCGTGGCCAGGCAGCCCAGAGTCACGGAGGTGGCATTGGAGGGAATGTTTTT A00814:550:HYJTNDSX2:4:1117:27362:15436 -1 2780052 3085745 barcode:ACGCAGCCATTATCTC umi:41999 GCGTTATCCACCTTCCACTGTACTTTGGCCTCTCTGGGATAGAAGTTATTCAGCAGGCACACAACAGAGGCAGTTCCAGATTTCAACTGC

briney commented 2 years ago

Hi Sara-

A couple things. First, your sequences don't appear to be in FASTA format, which is:

>sequence_id1
ATGC
>sequence_id2
CGTA

The sequence ID lines in your file appear to be lacking the leading > character. Improperly formatted FASTA files will likely fail when being read by abstar, which uses biopython's SeqIO.parse() function for reading input files.

Second, it doesn't appear that the sequences you provided are antibody recombinations. Abstar was unable to confidently identify any VDJ genes, and this was verified by testing the sequences using IMGT/V-QUEST, which was also unable to find any VDJ genes.

saramoein372 commented 1 year ago

Hi Bryana,

Thanks for your response. Do you know any tool that I can get the VDJJ assignment for my sequences in the shape you saw?

Regards, Sara

On Thu, Nov 17, 2022 at 3:19 PM Bryan Briney @.***> wrote:

Hi Sara-

A couple things. First, your sequences don't appear to be in FASTA format, which is:

sequence_id1 ATGC sequence_id2 CGTA

The seqeunce ID lines in your file appear to be lacking the leading > character. Improperly formatted FASTA files will likely fail when being read by abstar, which uses biopython's SeqIO.parse() function for reading input files.

Second, it doesn't appear that the sequences you provided are antibody recombinations. Abstar was unable to confidently identify any VDJ genes, and this was verified by testing the sequences using IMGT/V-QUEST, which was also unable to find any VDJ genes.

— Reply to this email directly, view it on GitHub https://github.com/briney/abstar/issues/112#issuecomment-1319154626, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVVJONXRTUTRXT6DV6JFXSDWI2HOJANCNFSM6AAAAAASDXPGHI . You are receiving this because you authored the thread.Message ID: @.***>