KiranJavkar / PRAWNS

PRAWNS: A fast and scalable bioinformatics tool that generates an efficient pan-genome representation of closely related whole genomes to provide a concise list of genomic features
GNU General Public License v3.0
7 stars 1 forks source link

How to representation pangenome using this tools #1

Open GeorgeBGM opened 8 months ago

GeorgeBGM commented 8 months ago

Hi, I have built the Graph Pan-genome, how should I use the tool to characterize it and dig deeper into the potential information it contains. Or is there any other analysis tool recommended.

Best, Du

KiranJavkar commented 8 months ago

Hi Du,

Thanks a lot for using PRAWNS and reaching out with your query! The output folder generated from PRAWNS should contain a couple of fastq files: metablocks.fastq and retained_blocks.fastq. Based on the type of downstream analysis you may like to pursue, you can use these fastq files, primarily the metablocks.fastq file, to perform the alignments or other related comparisons

The PRAWNS manuscript provides a few instances of these use cases: https://academic.oup.com/bioinformatics/article/39/1/btac844/6965020 (Section 3.4)

For instance, you can identify the conserved_regions of higher interest, say those with a length of at least 500 bp, and BLAST them via NCBI Web's BLAST search, the nr database, or a more specialized database like the antimicrobial genes.

Once you detect some functionally important conserved regions or paired regions, you can go back to the presence-absence and coords (coordinates) CSV files to identify the genomes where these regions exist and the genomic context of their presence.

Please let me know if this helps or have further questions. In case you need specific assistance with your problem statement, please reach out to me over email and we can discuss this elaborately.

Thanks again, -Regards, Kiran Javkar

GeorgeBGM commented 7 months ago

Hi Kiran Javkar.

Very nice work, PRAWNS is supposed to be used directly for analysing genomic FASTA data, so how can I use it to analyse pan-genomic GFA data for human constructs (https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/ freeze1/minigraph-cactus/hprc-v1.1-mc-grch38/hprc-v1.1-mc-grch38.gfa.gz) ,I would like to use PRAWNS to characterise this data (GFA data) and then use it for downstream analysis, any suggestions please. I'm sorry for not replying to you in time, thanks for your hard work.

Thanks again.

-Regards.

Du

At 2023-11-21 04:05:03, "KiranJavkar" @.***> wrote:

Hi Du,

Thanks a lot for using PRAWNS and reaching out with your query! The output folder generated from PRAWNS should contain a couple of fastq files: metablocks.fastq and retained_blocks.fastq. Based on the type of downstream analysis you may like to pursue, you can use these fastq files, primarily the metablocks.fastq file, to perform the alignments or other related comparisons

The PRAWNS manuscript provides a few instances of these use cases: https://academic.oup.com/bioinformatics/article/39/1/btac844/6965020 (Section 3.4)

For instance, you can identify the conserved_regions of higher interest, say those with a length of at least 500 bp, and BLAST them via NCBI Web's BLAST search, the nr database, or a more specialized database like the antimicrobial genes.

Once you detect some functionally important conserved regions or paired regions, you can go back to the presence-absence and coords (coordinates) CSV files to identify the genomes where these regions exist and the genomic context of their presence.

Please let me know if this helps or have further questions. In case you need specific assistance with your problem statement, please reach out to me over email and we can discuss this elaborately.

Thanks again, -Regards, Kiran Javkar

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

KiranJavkar commented 7 months ago

Using PRAWNS to analyze the pan-genomic constructs would need some background pre-processing since PRAWNS doesn't take gfa file inputs. You would need to create separate fasta files each containing the contigs to be compared. For instance, if you like to compare 10 paths within a genome assembly subgraph from the gfa file, each of these 10 paths needs to be saved into 10 different fasta files. These 10 fasta files would then act as 10 "genomes" which can be given as input to PRAWNS (through the appropriate CSV file generated)

There are several gfa to fasta converters available:

The critical challenge would be to decompose the human constructs of your interest into separate fasta files. Once you get to that point, you can run PRAWNS for these fasta files just like you would do for any collection of genomes.

Hope this answers your question. Let me know if you have further queries and thanks again for using PRAWNS!

GeorgeBGM commented 7 months ago

Hi Kiran Javkar.

Thank you very much for your patience in replying.

I would like to ask if I have more than 100 human genomes (about 3 billion bases per genome), will the execution of PRAWNS software be slow in this case, and are there some suggestions?Other than that, looking forward to your GFA format software development.

Thanks again.

-Regards.

Du

At 2023-12-08 07:04:33, "KiranJavkar" @.***> wrote:

Using PRAWNS to analyze the pan-genomic constructs would need some background pre-processing since PRAWNS doesn't take gfa file inputs. You would need to create separate fasta files each containing the contigs to be compared. For instance, if you like to compare 10 paths within a genome assembly subgraph from the gfa file, each of these 10 paths needs to be saved into 10 different fasta files. These 10 fasta files would then act as 10 "genomes" which can be given as input to PRAWNS (through the appropriate CSV file generated)

There are several gfa to fasta converters available:

https://www.biostars.org/p/169516/ https://gist.github.com/fedarko/9fe32014f1e55d80511be0d22dc36830 https://toolshed.g2.bx.psu.edu/repository/display_tool?repository_id=6ba709460671df83&tool_config=%2Fsrv%2Ftoolshed-repos%2Fmain%2F004%2Frepo_4583%2Fgfa_to_fa.xml&changeset_revision=e33c82b63727&render_repository_actions_for=tool_shed to name a few.

The critical challenge would be to decompose the human constructs of your interest into separate fasta files. Once you get to that point, you can run PRAWNS for these fasta files just like you would do for any collection of genomes.

Hope this answers your question. Let me know if you have further queries and thanks again for using PRAWNS!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>