Closed iaindhay closed 1 year ago
Hi, @iaindhay
Thank you for using syntenet.
I guess you misunderstood what the internal function check_list_names()
does. This function checks if lists (seq
and annotation
) have the same names. By visual inspection, I can see that names are different. You can confirm that by executing:
> names(aastringsetlist)
[1] "processed_NC_004578_cds_protein_sequences" "processed_NC_005773_cds_protein_sequences"
> names(grangeslist)
[1] "processed_NC_004578_cds_features" "processed_NC_005773_cds_features"
I would suggest renaming them to keep only the 'NC_...' part, or you could give them a better (human-readable) name.
What you are describing in the issue (checking if sequence names in seq
match gene IDs in annotation
) is another step of the quality control, which is performed by the internal function check_gene_names()
. Maybe you got confused there.
If you find that helpful, feel free to close the issue.
Best, Fabricio
Thank you for the help. Sorry for the misunderstanding, yes indeed renaming the raw files appropriately seems to have fixed the issue.
I have been trying to get syntenet working but i cant seem to get past the check_input - check_list_name stage. My understanding is that it is comparing the header/name of the proteins in the fasta/AAString object with the "gene_id" in the GRange object (from column 9 of the gff file). But i cant seem to get it to process even when they are identical. The files i am working with are user generated (i.e not for a database) and we have the fasta header as the protein ID and after failing to get it to pass we have made every flag in the gff column 9 to be the same protein ID. I have tried to use the "gene_field" option in check_input and set it to any other column in the GRange object but it doesn't seem to help. I have tried to remove the ".1" form all the names in both fasts and gff files and doesn't change. Any help appreciated.
Im using the current version on bioconductor - 3.17