Closed smorzechowski closed 8 months ago
As an addendum, I found a tiny bug when trying to manually specify gap characters. I tried the following
clipkit $alignment --gap_characters "-?*XxNn"
but get this error:
clipkit: error: argument -gc/--gap_characters: expected one argument
When I tried without quotes:
clipkit $alignment --gap_characters -?*XxNn
I get the same error message.
However, when I specified just Nn
or NnXx*?-
, e.g.
clipkit $alignment -gc NnXx*?-
it works just fine without throwing any errors and all the characters are used in the out file!
Hi @smorzechowski,
Firstly, thank you so much for using ClipKIT and for writing about your issue to us. We really appreciate community members that help improve the overall experience and quality of ClipKIT.
Our apologies for the confusion regarding gap characters.
The help message was insufficiently clear - sorry about that. Amino acid gaps Xx-?*
and nucleotide gaps are XxNn-?*
.
Regarding the error message when specifying gaps as -?*XxNn
, the parser is getting confused when the gap characters start with -
because it is detecting an argument. That is why the error message is that an argument is not being detected. This clarification has been updated in the documentation and help message.
You can download the latest ClipKIT release, version 2.2.4, using pip3 install clipkit -U
.
Also, to check that the gap characters are being interpreted correctly, ClipKIT prints all user arguments:
-------------
| Arguments |
-------------
Input file: example_file.fa (format: fasta)
Output file: example_file.fa.clipkit (format: fasta)
Sequence type: Nucleotides
Gaps threshold: 1
Gap characters: ['?', '*', 'X', 'x', 'N', 'n']
Trimming mode: smart-gap
Create complementary output: False
Process as codons: False
Create log file: False
(see the line: Gap characters
).
Thank you again for your message!
best,
Jacob
Thanks for this wonderful tool! I wanted to clarify some information in the help file about the defaults for the
--gap_characters
argument.Should the default for nucleotides include N as a gap character as the information under Sequence type suggests?
I just wanted to check since the defaults under --gap_characters reflect the opposite I believe.
Thanks very much!
Technical Details