Closed AdmiralenOla closed 7 years ago
Actually, I was just about to suggest an alternative, allowing the user to specify column numbers to be included in the output (so I can see the gene numbers of specific strains in the dataset in the Scoary output).
I am now modifying the "Non-unique gene name" column for this and then split that one out.
Hi! Trying to wrap my head around this, but I don't quite see how it would work. I think I'm confused by "gene numbers of specific strains in the dataset". Do you mean grabbing columns from the input Roary file or producing some kind of aggregate column? Would you mind giving an example?
The way I envisage it is similar to the switch included that Scoary starts counting from column 15 in the Roary output.
Say that these are the headers from a Roary output:
Gene
Non-unique Gene name
Annotation
No. isolates
No. sequences
Avg sequences per isolate
Genome Fragment
Order within Fragment
Accessory Fragment
Accessory Order with Fragment
QC
Min group size nuc
Max group size nuc
Avg group size nuc
Sample1
Sample2
Sample3
The Scoary output will contain the first 3 columns followed by the counts, etc:
Gene
Non-unique gene name
Annotation
I would like to be able to have a switch where I can also include the information in the rows for Sample1, Sample2 and/or Sample3, something like "--columns_included 16,17,18". The group output of Roary is not always informative, the gene number can be.
Will see whether I can upload an example.
OK, I think I understand what you mean now. Sure, I can implement that, should be fairly easy! I will schedule it for the next release.
Cool! I aim to get you a lot of citations and help you increase your h-index ;-)
Hi @dutchscientist. This functionality is included in the latest version. Hope you like it!
Hi Ola, great! Will try it soon (currently travelling for a few weeks) :)
From: Ola Brynildsrud [mailto:notifications@github.com] Sent: 04 July 2017 00:14 To: AdmiralenOla/Scoary Scoary@noreply.github.com Cc: dutchscientist dutchscientist@gmail.com; Mention mention@noreply.github.com Subject: Re: [AdmiralenOla/Scoary] Don't enforce "Non-unique gene name" and "Annotation" columns (#57)
Hi @dutchscientisthttps://github.com/dutchscientist. This functionality is included in the latest version. Hope you like it!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/AdmiralenOla/Scoary/issues/57#issuecomment-312629837, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJ8e0G6eC3c6rwHLlHNxcWfklaqGlGOWks5sKNsOgaJpZM4M6B8y.
Yes, this is great! Exactly what I wanted, the --include_input_columns is just what I needed. Thanks very much!
Remove enforcing of the columns "Non-unique gene name" and "Annotation" in the output. Some users might have input file with only a single identifier column (Gene ID) before sample info starts, and wants to run with -s 2.
In the current version, this will cause Scoary to fill in the "Non-unique Gene name" and "Annotation" columns with sample data. (Because it automatically assumes that this info can be found in columns 2 and 3). There is really no need to enforce any other columns than Gene ID.