Open marianamazzochi opened 1 year ago
This is somewhat out of the scope of bug reports about the Phyluce software. There are also lots of options you could pursue that are specific to your project and what you are trying to do with your data. Because you have fasta files for a set of individuals now, you can select one individual as your "reference" fasta and call SNPs for the individuals in each population against that reference. One way that we do this in my lab are detailed here: http://protocols.faircloth-lab.org/en/latest/protocols-computer/analysis/analysis-gatk-parallel.html. That said, there are many different ways to do the same sorts of things...
Thanks, Brant. I am following the mentioned pipeline, but I think the trimming part contains a bug - the command 'module' is not working. I tried to google some solutions, but it appears that 'module' doesn't work anymore due to the disablement of the set_shell_startup
configuration which is now the default with this environment-modules update. Actually, I can't even install environment-modules, which outputs 'Invalid operation'.
So, I am stuck at this part of the pipeline:
export CORES_PER_JOB=4
module load jdk/1.8.0_161 module load gnuparallel/20170122
cd $PBS_O_WORKDIR
export JOBS_PER_NODE=$(($PBS_NUM_PPN / $CORES_PER_JOB))
parallel --colsep '\,' \ --progress \ --joblog logfile.trimmomatic.$PBS_JOBID \ -j $JOBS_PER_NODE \ --slf $PBS_NODEFILE \ --workdir $PBS_O_WORKDIR \ -a files-to-trim.txt \ ./trimmomatic-sub.sh {$1} {$2}
Do you know a different way to call java and parallel to substitute 'module' command? Thanks again,
Howdy,
These are just examples of how you might go about running these types of analyses - they are written for our particular HPC environment. As a result, they'll need to be modified for your particular environment in order to run correctly (you should also have all the trimmomatic parts already run from the phyluce pipeline.
The important commands to focus on are those running particular programs, usually at the bottom of each script that I sent you. You should be able to modify this to work with whichever environment you are using for analysis.
-b
Brant, I really do have the trimmomatic parts performed by phyluce, but it seems like I had removed all ambiguities following that pipeline. I need a file which contains specific ambiguities, like Y and R. Am I able to get that using the files I obtained from phyluce pipeline?
If you follow the standard pipeline (e.g. Tutorial 1), the results output by that approach do not contain variable positions (e.g. Y, R, and the other IUPAC base codes). They are not meant to, because variable positions can cause problems in some phylogenetic analysis programs.
If you need alignments with variable positions (or VCF files with variant bases), then you will need to treat your data in a "custom" way - meaning that you will likely have to move outside of what is supported by (and described for) Phyluce. One way to do that is using an approach similar to what I described, above. Another way to do that could be to try the phasing pipeline, but that still may not produce the exact data that you need.
In short, what you need to do is somewhat specific to the goals of your project - and you'll need to decide which method is best achieve those goals and how to implement that (or those) methods.
Dear Brant, My project is with 6 populations of a seabird species. I am aiming to estimate their genetic structure and other parameters. I used your pipeline for trimming and assembling my fastq data, as we have talked before. However, I now have realized that I have obtained data for haploid individuals, without any ambiguities. I don't know if I've missed some part of the tutorial, but I couldn't find out how to maintain the ambiguities (or obtain two sequences per individual, with all alleles for that individual). Could you, please, help me?
Cheers,