WeichenZhou / PALMER

Pre-mAsking Long reads for Mobile Element inseRtion
MIT License
13 stars 5 forks source link

PALMER2

Required resources:

 samtools/1.3.1  https://github.com/samtools/samtools
 ncbi-blast++/2.10.0  ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ (Lower version will introduce fatal bugs.)

Getting started

Download and Install

git clone https://github.com/WeichenZhou/PALMER.git
cd PALMER
make

We recommend using ncbi-blast++/2.10.0 and running individual chromosomes parallelly for the most efficient performance.

Parameters

Required:

--input
         aligned long-read sequencing BAM file with directory path

--workdir
         the user's working directory. Please follow the format /your/woking/directory/ !!don't forget the last '/'!!

--ref_ver (options: hg19, GRCh37, GRCh38 or other)
         reference genome used for the aligned file ('other' option for the cusmized genome out of hg19, GRCh37 or GRCh38)

--ref_fa
         indexed fasta file of reference genome fasta file with directory path used for the aligned bam/cram file (wrong reference will cause error information)

--type (options: LINE, ALU, SVA, HERVK, or CUSTOMIZED (if you want to setup your costomized sequence))
         type of MEIs or other kinds of insertions to detect

--mode (options: raw/ccs, or asm)      
         type of input sequencing to be processed (raw: raw nanopore/PacBio reads; asm: assembled contigs)

--chr (default: ALL (for whole genome, not recommended); options: chromosome1, chromosome2, ...chromosomeY)
         chromosome name for PALMER to run. !!The chromosome names should be consistent with the ones in reference genome version!! e.g. for GRCh37, to run PALMER on chromosome1, the option should be '1', while for GRCh38 it should be 'chr1'

Optional:

--start (default: Null)
         start position in the genome for PALMER to run (default is null). !!It should go with --end if assigned

--end (default: Null)      
         end position in the genome for PALMER to run (default is null). !!It should go with --start if assigned

--custom_seq (default: Null)
         .fasta file with directory path to customize your insertion finding. e.g. NUMTs, MEIs in other species.

--TSD_finding (Fixed: TRUE for all MEIs ,or default: FALSE for CUSTOMIZED insertion)
         whether to run TSD motif finding module for your insertion calling

--len_custom_seq (MUST set up when activating TSD_finding for CUSTOMIZED insertion, otherwise CLOSED)
         interger value for the length of your customized sequence WITHOUT polyA tact

--L_len (default: 25bp)
         the minimum length of putative LINE-1 aligned to L1.3 sequences

--output (default: output)
         the prefix of the output file

Examples

1) Running PALMER on example PacBio raw reads bam file under the 'example' folder to call LINE-1 insertions on GRCh38 genome
./PALMER --input $PALMER_Path/example/sample.bam --workdir $DirPath/ --ref_ver GRCh38 --output sample --type LINE --mode raw --chr chr19 --ref_fa $your.reference.file.path/GRCh38.fa

Results (sample_calls.txt & sample_TSD_reads.txt)  from example bam file can also be found under the 'example' folder.
2) Running PALMER on your aligned sequences on GRCh37 reference genome to call LINE-1 insertions in chromosome3 at position from 200,000 to 400,000
./PALMER --input $DirPath/your.bam.file --workdir $DirPath/ --ref_ver GRCh37 --output your.output.prefix --type LINE --mode raw --chr 3 --start 200000 --end 400000 --ref_fa $your.reference.file.path/hs37d5.fa
3) Running PALMER on your aligned assembled contigs in cram based on GRCh38 reference genome to call SVA insertions in chromosome3
./PALMER --input $DirPath/your.cram.file --workdir $DirPath/ --ref_ver GRCh38 --output your.output.prefix --type SVA --mode asm --chr chr3 --ref_fa $your.reference.file.path/GRCh38.fa
4) Running PALMER on your aligned bam to call Alu insertions in chromosome2a of Champanzee genome
./PALMER --input $DirPath/your.bam.file --workdir $DirPath/ --ref_ver other --output your.output.prefix --type ALU --mode raw --chr chr2a(chr.name.based.on.your.reference.fa) --ref_fa $your.reference.file.path/your.reference.fa 
5) Running PALMER on your aligned bam to call NumtS in chromosome5 of Champanzee genome
./PALMER --input $DirPath/your.bam.file --workdir $DirPath/ --ref_ver other --output your.output.prefix --chr chr5 --mode raw --ref_fa $your.reference.file.path/your.reference.fa --type CUSTOMIZED --custom_seq $your.custom_seq.file.path/Clint.mt 
6) Running PALMER on your aligned bam to call LINE-1 insertions in chromosomeX of mice genome
./PALMER --input $DirPath/your.bam.file --workdir $DirPath/ --output your.output.prefix --chr chrX --ref_ver other --mode raw --ref_fa $your.reference.file.path/your.reference.fa --type CUSTOMIZED --custom_seq $your.custom_seq.file.path/L1MdA_consensus.fa --TSD_finding TRUE --len_custom_seq (int)
7)
A callset of non-reference L1Hs in HG002, HG003, and HG004 [a Personal Genome Project trio derived from the Genome in a Bottle (GIAB) Consortium] using PALMER is available under:
ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/PacBio_PALMER_11242017/

Output and Notes

We have two outputs: 'output_calls.txt' & 'output_TSD_reads.txt'.

'output_calls.txt' is the summary for all non-ref MEI calls.

'output_TSD_reads.txt' contains all details you want for the high confident (HC) supporting reads (SRs).

Please check the files in the example folder for the meaning(title) for each column of output.
Please use a cutoff of 'Potential_supporting_reads' and 'Confident_supporting_reads' for any output of 'calls.txt' to filter out the false positive hits.

Citation

For general use or LINE-1s:

For all MEIs:

For PALMER2.0: In preparation!

Contact

Logs

Ver2.0.0 May.20th.2022! PALMER2.0 is online now!! 520 (。・ω・。)ノ♥♥♥!!

Ver1.7.2 Nov.28th.2020! Happy Thanksgiving!!

Ver1.7 Nov.11th.2020! Happy Singles Day & happy shopping!!

Ver1.6.2.Enhanced Sep.27th.2020 by Jixing Guan

Ver1.6.2 May.19th.2020

Ver1.6.1 May.19th.2020

Ver1.6 May.11th.2020

Ver1.5.1 May.7th.2020

Ver1.5 May.4th.2020 "MAY THE FORCE BE WITH YOU!"

Ver1.4.1 Nov.14th.2019

Ver1.4 Feb.27th.2019

Ver1.3.3 Feb.3rd.2019 ^^^( ̄(oo) ̄)^^^ Happy Lunar New Year! Year of the Pig!! ^^^( ̄(oo) ̄)^^^

Ver1.3.0 Dec.22th.2018

Ver1.3 Dec.5th.2018

Ver1.2 Sep.5th.2018

Ver1.1 Apr.24th.2018

Ver1.0 Feb.14th.2018