Open haruosuz opened 4 months ago
Hi @haruosuz, that is a little odd. Typically poorly formatted GFF files are removed at the initial stage. Were your samples annotated with prokka? I would suggest you check the GFF file for removed sample to ensure it is formatted correctly and contains CDS/genes (i.e. is not empty). If it looks normal feel free to email me the and I will check to see if there is anything odd going on (perhaps include a handful of the files that worked as well as contrasts).
Thank you for your reply. The 322 genomes were annotated with DFAST. Among the 322 GFF files, there isn't any empty file. In the # 1415 gene families in 321 genomes.
in the
You can check the headers in the PIRATE.gene_families.tsv file and compare them to your input sample list.
Thank you for your reply.
The following command did not produce any output, indicating that there is no difference between the genomes listed in the headers in the PIRATE.gene_families.tsv file and input sample list provided in the "genome_list.txt" file:
diff <(head -n 1 PIRATE.gene_families.tsv | tr "\t" "\n" | tail +21) <(cat genome_list.txt | sort)
The discrepancy in the numbers (322 vs. 321 genomes) remains unclear. Here are the commands and their outputs provided:
$ wc -l genome_list.txt
322 genome_list.txt
$ head -n 1 PIRATE.pangenome_summary.txt
# 1415 gene families in 321 genomes.
So it found all your input genome files but is saying there is an additional one at one internal step? Are you sure you don't have a line including just whitespace in the genome_list.txt file?
I ran PIRATE with 322 genomes (gff files) as input. While the
The
$ cat genome_list.txt | wc -l
322
$ cat genome_list.txt | grep -v "^$" | wc -l
322
I ran PIRATE with 322 genomes (gff files) as input, but the file indicates 321 genomes. Is there any way to investigate why the count decreased from 322 to 321? The confirmation details are as follows: