Open MatthewTCManion opened 2 months ago
Your BED file is not in BED format: chr start end name
:
This will fix it:
tail -n +2 fwf_gene_assignments.bed | awk -F "\t" -v 'OFS=\t' '{ print $2, $3, $4, $1; }' > fwf_gene_assignments.fixed.bed
Your BED file is not in BED format:
chr start end name
:This will fix it:
tail -n +2 fwf_gene_assignments.bed | awk -F "\t" -v 'OFS=\t' '{ print $2, $3, $4, $1; }' > fwf_gene_assignments.fixed.bed
Thank you for your reply! I tried running it with this suggestion, and the BED file now seems to be in the correct format, but the same error came up:
Please let me know if you have any additional suggestions!
Best, Matt
Okay so I was able to get the BED file into proper format,
and ran create_cistarget_track_databases.py,
but I am getting an issue with scoring the tracks for each of my 4 input .bw files:
What is the output of for example the first bigWigAverageOverBed
command?:
/data/PetrosLab/Conda/envs/create_cistarget_databases/bin/bigAveragedOverBed -minMax /data/PetrosLab/Matt/scenicplus/chipseq/tracks/MGE2-60_S13.cpm.norm.bw /data/PetrosLab/Matt/scenicplus/chipseq/tracks/consensus_tabbed.bed /dev/stdout
(best copy it from your slurm output as I might have made typos)
You probably should make the BED files like I suggested:
tail -n +2 fwf_gene_assignments.bed | awk -F "\t" -v 'OFS=\t' '{ print $2, $3, $4, $1; }' > fwf_gene_assignments.fixed.bed
with only 4 columns, as bigWigAverageOverBed probably does not like your 5th column.
Creating a track database also only would make some sense when you have several hundreds of tracks (even better thousands), instead of only 4.
Thanks for your help,I dropped it to 4 columns and it seems to have ran and created the .feather rankings and scores.
Creating a track database also only would make some sense when you have several hundreds of tracks (even better thousands), instead of only 4.
Can you clarify what you mean by this to ensure I am using this pipeline correctly?
I have run SCENIC+ using motifs and the motif database, which I understand has thousands of motifs.
My understanding is that I am creating the track database from my ChIP-seq data (one track per sample), and scoring binding in these tracks against the region BED created by running pycisTopic on my ATAC-seq data to create a database that can be used in the SCENIC+ pipeline as way to target regions where I have TF binding in my samples. Am I understanding that incorrectly?
If so, do you have any suggestions on the proper way to incorporate my TF-binding data into identification of GRNs?
Thanks, Matt
Hi @MatthewTCManion
Do you want to create a single database containing both motif and ChIP-seq scores? Or a database containing only ChIP-seq data?
The latter probably does not make a lot of sense, given that you only have few tracks calculating the enrichment values will be impossible.
The former might be a good idea, however in the context of SCENIC+ we have not tried this yet.
All the best,
Seppe
@SeppeDeWinter Ideally, a database containing both motif and ChIP-seq scores. Thank you for the reply!
@SeppeDeWinter Do you think it would be possible to use the outputs of a motif-discovery tool (we have successfully identified motifs in our ChIP-seq data using the MEME suite of tools) to create a motif database for SCENIC+ targeted to motifs in our ChIP-seq set?
@MatthewTCManion That should be possible. Convert your motifs (from Homer/MEME/...) to ClusterBuster format (you can use BioPython for this if you want. Make sure that your PWM contains counts and not frequencies (else multiply by 100))) and use that together with our provide motif collection to make your own database. Later you will have to add your motif to the motif2tf table so if your motif is found, it will actually be used by the SCENIC+ analysis.
Against which TFs did you do ChIP-seq and are the motifs you obtain not in our motif collection?
I will try that!
We're using Nkx2.1 for our ChIP-seq
Looks like a motif for Nkx2-1 is at least in JASPAR, so it should be detected with our default motif collection already: https://jaspar.elixir.no/matrix/MA1994.2/
We have seen some inconsistency between the Nkx2-1 motif between different databases, so one thing we' have done is to generate the motif from our own binding data in multiple Nkx2-1 expressing tissues to hopefully capture a more consistent motif, but I agree it should at least partially resemble the JASPAR Nkx2-1 motif
HOCOMOCO
JASPAR
Hello , I am running into an issue using this script where the .bed file with regions to score is not recognized correctly, and I have tried a few different formats with no success. For reference, here is a screenshot of my most recent attempt to run the script, as well as the format of my .bed:
I assume the issue is with the format of the .bed or the genes/regions data, but I can't find what the proper format should be.
Thanks, Matt
_Originally posted by @MatthewTCManion in https://github.com/aertslab/create_cisTarget_databases/issues/17#issuecomment-2317906079_