Busydog1990 / CProtMEDIAS

9 stars 1 forks source link

What data do I need to prepare before using this R packages? #1

Open tytrhr opened 1 year ago

tytrhr commented 1 year ago

Hello, I looked at the process, but a little confused at the first step, I do not know how to start, can you give me some advice?

Busydog1990 commented 1 year ago

Thank you for using CProtMEDIAS.

Firstly, you should install R (Below 4.2.0). Then, install CProtMEDIAS with:

if (!require(devtools)){install.packages("devtools")}

devtools::install_github("Busydog1990/CProtMEDIAS")

You can see the workflow with:

vignette("genepro")

#############

If you want to start your analysis, firstly you should prepare a set of amino acid sequences. These amino acid sequences should preferably have sequence homology. Then, align these amino acid sequences with sequence alignment software such as MAFFT, ClustalW or MUSCLE. It is recommended to save the results in fasta format. Next, read the alignment results into the R environment, follow the workflow above.

tytrhr commented 1 year ago

Hi,is Homeobox_small your sequence data? The first column of my fasta sequence is the name, but your first line is the name. May I ask how to get the format of this file, can you simply describe the process? In addition, the file reading method you use is readAAStringSet? Looking forward to your reply

Busydog1990 commented 1 year ago

Fasta format sequence file is sufficient. The file reading method is readAAStringSet in R package Biostrings. Read the aa fasta format file into the R environment to become an AAStringSet object. Homeobox_ small is a list that combines multiple AAStringSet object. You can create a list using the list() function in R environment, or use lapply() to read more than one fasta files into R environment.