unique species id in `process_input()`

Hi, @iaindhay

The IDs are created from list names, and they must have 3-5 characters only. Under the hood, the function that creates IDs (create_species_id_table(), see documentation here) takes list names and extracts the first 3 characters; if there are repeated IDs, it will try 4 characters; if there are repeated IDs, it will try 5 characters; if even with 5 characters there are still repeated IDs, it will use 4 characters + numbers.

For example, suppose your list names are:

> names(seq)
[1] "Arabidopsis_thaliana" "Arabidopsis_lyrata" "Brassica_rapa"

In this case, even if we try to use the first 5 characters, there would be repeated IDs ("Arabi" twice). Then, the function adds numbers to distinguish IDs. The unique IDs in this case would be c("Arabi", "Arab2", "Brass").

That said, if you want to use custom IDs, you can use them as list names (which will be used by create_species_id_table() to create the IDs), but bear in mind that only the first 3-5 characters will be used. If you want to use genome accessions with more than 5 characters, you will not be able to use the entire accession as IDs.

Best, Fabricio

almeidasilvaf / syntenet

unique species id in `process_input()` #15