NIEHS / ROTLA

MIT License
3 stars 2 forks source link

mtDNA reference sequence invalid #2

Open Pilibeibei opened 3 years ago

Pilibeibei commented 3 years ago

I used rCRS (NC_012920) as mtDNA reference sequence and received error as "Invalid fasta format: sequence size == 0". I also tried reference sequence file "chrM.fa" from MToolBox, and received same error. Would you please provide a sample of data format you used as chrM reference sequence. It would be helpful for users to follow the process of the tool. Thanks!

Error message: Invalid fasta format: sequence size == 0 for element (J�r ���id�0G� Invalid fasta format: sequence size == 0 for element z��S�ߍ�!�>��,���5!� Invalid fasta format: sequence size == 0 for element �J����o�6��R�HpTFB�o�4�\ń��1��/4˚ Invalid fasta format: sequence size == 0 for element A5�'c� Invalid fasta format: sequence size == 0 for element V���e���l���k{����t"�����ٍ;Qa-��,��:v4��Sq���T��D=AF������H,t6%a��!������K Invalid fasta format: sequence size == 0 for element �͎6�!��PB�'����O>�x�]���+�yړ���p�q?�ʸ� Invalid fasta format: sequence size == 0 for element ����ܝ3~0�'"`��g��F� Invalid fasta format: sequence size == 0 for element ث.��A Invalid fasta format: sequence size == 0 for element =p�/�x�^�hP������+f�i�-��$�ݴ��G�Yvp���G�yպ�HRi��I�X�D�$��/j�m��r��I�C�� Invalid fasta format: sequence size == 0 for element &Ni� Invalid fasta format: sequence size == 0 for element �#��7��χ�t-^��ʷl�er��g�xJ�6M������4�+�L}

burkholderab commented 3 years ago

Any standard FASTA file, such as those you describe, should be compatible. The various non-letter characters in the error suggest that the tool may be attempting to process a binary rather than plaintext file. Is it possible that your FASTA is compressed? If you send a copy of the command you ran to receive this output, I may be able to provide further suggestions.

From: Pilibeibei notifications@github.com Sent: Friday, December 18, 2020 12:10 PM To: NIEHS/ROTLA ROTLA@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [NIEHS/ROTLA] mtDNA reference sequence invalid (#2)

I used rCRS (NC_012920) as mtDNA reference sequence and received error as "Invalid fasta format: sequence size == 0". I also tried reference sequence file "chrM.fa" from MToolBox, and received same error. Would you please provide a sample of data format you used as chrM reference sequence. It would be helpful for users to follow the process of the tool. Thanks!

Error message: Invalid fasta format: sequence size == 0 for element (J�r ���id�0G� Invalid fasta format: sequence size == 0 for element z��S�ߍ�!�>��,���5!� Invalid fasta format: sequence size == 0 for element �J����o�6��R�HpTFB�o�4�\ń��1��/4˚ Invalid fasta format: sequence size == 0 for element A5�'c� Invalid fasta format: sequence size == 0 for element V���e���l���k{����t"�����ٍ;Qa-��,��:v4��Sq���T��D=AF������H,t6%a��!������K Invalid fasta format: sequence size == 0 for element �͎6�!��PB�'����O>�x�]���+�yړ���p�q?�ʸ� Invalid fasta format: sequence size == 0 for element ����ܝ3~0�'"`��g��F� Invalid fasta format: sequence size == 0 for element ث.��A Invalid fasta format: sequence size == 0 for element =p�/�x�^�hP������+f�i�-��$�ݴ��G�Yvp���G�yպ�HRi��I�X�D�$��/j�m��r��I�C�� Invalid fasta format: sequence size == 0 for element &Ni� Invalid fasta format: sequence size == 0 for element �#��7��χ�t-^��ʷl�er��g�xJ�6M������4�+�L}

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/NIEHS/ROTLA/issues/2, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADCMIB3FAICDA6CX3KOEHHLSVOEFDANCNFSM4VBOC7UA.

Pilibeibei commented 3 years ago

I used the command: ROTLA find-breakpoints s1.R1.fastq.gz s1.R2.fastq.gz chrM.fa s1

I tried different versions of chrM.fa, and all received the same error, the FASTA is not compressed. Below are the mtDNA reference sequence files I have tried:

  1. chrM.fa from MToolBox
  2. AF347015.1.fa downloaded directly from NCBI
  3. NC_012920.fa downloaded directly from NCBI

Your help is much appreciated!

burkholderab commented 3 years ago

It looks like the issue is the use of gzipped FASTQ files. ROTLA requires uncompressed, plaintext FASTQs at this time. I can look into adding the ability to automatically recognize and decompress gzipped files in the near future, but for now, try unzipping them.

From: Pilibeibei notifications@github.com Sent: Friday, December 18, 2020 1:24 PM To: NIEHS/ROTLA ROTLA@noreply.github.com Cc: Burkholder, Adam (NIH/NIEHS) [E] adam.burkholder@nih.gov; Comment comment@noreply.github.com Subject: Re: [NIEHS/ROTLA] mtDNA reference sequence invalid (#2)

I used the command: ROTLA find-breakpoints s1.R1.fastq.gz s1.R2.fastq.gz chrM.fa s1

I tried different versions of chrM.fa, and all received the same error, the FASTA is not compressed. Below are the mtDNA reference sequence files I have tried:

  1. chrM.fa from MToolBox
  2. AF347015.1.fa downloaded directly from NCBI
  3. NC_012920.fa downloaded directly from NCBI

Your help is much appreciated!

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/NIEHS/ROTLA/issues/2#issuecomment-748244201, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADCMIB6SZ5P2UXHG6Y2SAR3SVOM3JANCNFSM4VBOC7UA.

Pilibeibei commented 3 years ago

Thanks, it works.

Looking forward to the new version than can read .gz file.