Continuous_traits - Githubissues

bloomarun commented 1 year ago

Hello! I have Roary output data (gene_presence_absence.csv) of about 450 isolates belonging to a single species. I have a traits file. But the traits are not binary. They are continuous. The pre-print says that scoary2 can work with continuous traits. How do I need to format my traits file and add a flag to scoary2, telling that my data is continuous in nature? P.S: We are talking of something like the color of a petal, where there is incomplete dominance. The flower can be Pink (Dominant), White (Recessive) or Yellow (Hybrid). How do I pass these as traits?

MrTomRod commented 1 year ago

Hi @bloomarun

Numeric traits

Yes, Scoary2 can binarize them for you. This is described in Wiki > Inputs. Are the instructions clear enough?

Flower colors

I don't follow, dominant and recessive are terms normally used in the context of polyploid genomes. Scoary2 assumes clonal reproduction so it may be the wrong tool for you!

My initial idea how to encode the color would be to create a binary "trait" per color, for example:

Trait	pink	white	yellow
isolate-1	1	0	0
isolate-2	0	0	1
isolate-3	0	1	0

bloomarun commented 1 year ago

Hello @MrTomRod Thanks for the reply. Okay, This is the scenario. I am trying to look at antibiotic resistance patterns in a dataset of bacterial genomes. My gene input is roary gene-presence-absence.csv and the traits file has data of Antibiotic susceptibility with Susceptible(S) , Resistant(R) or Intermediate resistant(I). Can I quantize them as S=0, I=0.5, R=1?

Another error when I tried to run with the data as I have described above, there is the following error while parsing the genes file:

pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3592, saw 13 command run: scoary2 gene_presence_absence.csv --gene-data-type 'gene-list:,' --traits traits.csv --trait-data-type 'gaussian:kmeans:,' --n-cpus 96 --outdir scoary2_out (both are .csv files)

MrTomRod commented 1 year ago

the traits file has data of Antibiotic susceptibility with Susceptible(S) , Resistant(R) or Intermediate resistant(I). Can I quantize them as S=0, I=0.5, R=1?

You can, but Scoary2 will simply binarize your data. It is better to do that manually in your case, imo.

pandas.errors.ParserError

Can you send me the dataset?

Fatma116 commented 1 year ago

Hello @MrTomRod I am trying to run scoary2 using the following command (scoary2 --genes /project/genomics/fatma/B1_vs_plant_vs_soil_vs_human_vs_aquatic/orthofinder/Orthofinder_prokka/OrthoFinder/Results_Jul25/Phylogenetic_Hierarchical_Orthogroups/N0.tsv --genes-data-type 'gene-list:\t' --gene-info N0_best_names.tsv --traits traits.tsv --trait-data-type 'binary:\t' --n-cpus 16 --outdir output). I am using the raw output file N0 and the traits file is binary(the traits are 4 groups representing origin of the strain " plant, soil, human and aquatic") and the isolates are given 1 for the group to which it belongs and 0 for the other groups. But I am getting this error

Loading traits...
Loading genes...
Welcome to Scoary2! (0.0.11)
Traceback (most recent call last):
  File "/home/comi/fatma.mahmoud/venv/bin/scoary2", line 8, in <module>
    sys.exit(main())
  File "/home/comi/fatma.mahmoud/venv/lib/python3.10/site-packages/scoary/scoary.py", line 289, in main
    fire.Fire(scoary)
  File "/home/comi/fatma.mahmoud/venv/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/comi/fatma.mahmoud/venv/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/comi/fatma.mahmoud/venv/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/comi/fatma.mahmoud/venv/lib/python3.10/site-packages/scoary/scoary.py", line 112, in scoary
    genes_orig_df, genes_bool_df = load_genes(
  File "/home/comi/fatma.mahmoud/venv/lib/python3.10/site-packages/scoary/load_genes.py", line 142, in load_genes
    genes_orig_df, genes_bool_df = load_gene_count_file(genes, delimiter, restrict_to, ignore)
  File "/home/comi/fatma.mahmoud/venv/lib/python3.10/site-packages/scoary/load_genes.py", line 45, in load_gene_count_file
    count_df = pd.read_csv(path, delimiter=delimiter, index_col=0)
  File "/home/comi/fatma.mahmoud/venv/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/home/comi/fatma.mahmoud/venv/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/home/comi/fatma.mahmoud/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/comi/fatma.mahmoud/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 611, in _read
    return parser.read(nrows)
  File "/home/comi/fatma.mahmoud/venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1778, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/home/comi/fatma.mahmoud/venv/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
    chunks = self._reader.read_low_memory(nrows)
  File "pandas/_libs/parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory
  File "pandas/_libs/parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 6, saw 4

I really appreciate your help

MrTomRod commented 1 year ago

@Fatma116 The reason is that your argument is --genes-data-type, but it should be --gene-data-type!

Fatma116 commented 1 year ago

@MrTomRod Sorry for this stupid mistake. Though I revised the code man times I couldn't notice it, but it is working now. Thanks a lot for your help

MrTomRod commented 1 year ago

@bloomarun

Does the problem persist or can I close the issue?

bloomarun commented 1 year ago

Yes You can close the issue.. I will revert if I have any other queries Thank you for your time and efforts..

On Mon, 28 Aug 2023 at 7:24 PM, Thomas Roder @.***> wrote:

@bloomarun https://github.com/bloomarun

Does the problem persist or can I close the issue?

— Reply to this email directly, view it on GitHub https://github.com/MrTomRod/scoary-2/issues/4#issuecomment-1695744848, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMS7DFXOB26SSAKZPJM2Z43XXSPKJANCNFSM6AAAAAA2L3SMI4 . You are receiving this because you were mentioned.Message ID: @.***>

-- Thanks and Regards: P.Arun Sai Kumar 9392808199 @. @.>*

MrTomRod / scoary-2

Continuous_traits #4

Numeric traits

Flower colors