fangwuwang / team_Bloodies

0 stars 2 forks source link

Homer Findingmotifs TFBS #15

Open fangwuwang opened 7 years ago

fangwuwang commented 7 years ago

@rawnakhoque I asked the PDF in our lab and he showed me that everything has been done in bash. Follow the installation and basic configuration step by step here. As shown in the webpage, genome configuration is done using this line (see Download Homer Packages session)-- perl /path-to-homer/configureHomer.pl -install hg19_ And to do the analysis there is only one line to run (link)-- findMotifsGenome.pl <peak/BED file> -size # [options]

rawnakhoque commented 7 years ago

@fangwuwang Thanks for your information. I already installed homer in the remote server since installing xcode and related tools is taking too much time in my mac. Hope It will work.

acavalla commented 7 years ago

@rawnakhoque are you passing it a HOMER peak file or a BED file? I don't understand what "Column5: not used" means (attached pic) image

fangwuwang commented 7 years ago

@rawnakhoque Our postdoc mentioned that Xcode installation may take 1-2 hours since it's 1-2 GB large. But you can try it at the same time as you are running the remote server, 1-2 hours is not that long and it will be very useful to you in the future.

acavalla commented 7 years ago

General note for the future on XCode: http://railsapps.github.io/xcode-command-line-tools.html I think my Mac already had XCode installed, so adding it again from the app store was unnecessary "The instructions [...] are confusing. You don’t need to "Get Xcode" from the App Store."

fangwuwang commented 7 years ago

@acavalla Are you around in the BCCRC building for a while? Rawnak is coming here to discuss some results she got from Homer in probably an hour.

acavalla commented 7 years ago

Yep - I'm on the 8th floor. where would you like to meet? @fangwuwang we also need to discuss getting the poster printed :)

fangwuwang commented 7 years ago

@acavalla we can meet on the main floor lunch area or the meeting room on 13th. @rawnakhoque Can you also send an email to Annie (acavalla@bcgsc.ca) know when you arrive?

rawnakhoque commented 7 years ago

I arrived 😊. In the main floor.

On Wednesday, March 29, 2017, fangwuwang notifications@github.com wrote:

@acavalla we can meet on the main floor lunch area or the meeting room on 13th. @rawnakhoque Can you also send an email to Annie (acavalla@bcgsc.ca) know when you arrive?

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.< https://ci4.googleusercontent.com/proxy/OUN8HzOz1WXK76mcIyN1VYpMv0eRzLKC7rrvOD_3dM-s3Kbrsx6pZ4Ev0LrdXqdzT_t1mYknpPhDYeL3QNhBc2mm8Jsga69j85xgQSwgJH3y7QGIu-7Ew6GA_fwPAYDXEaHen2pA9lmgRDKzFIL-dR7_kwv5lw=s0-d-e1-ft#https://github.com/notifications/beacon/AX06pK7F4OCL53kcbbg-7AfXcNsxJDsfks5rqwEjgaJpZM4Mtgpl.gif>

fangwuwang commented 7 years ago

@acavalla We are on the 13th floor meeting room.

fangwuwang commented 7 years ago

@rawnakhoque @acavalla Two comparisons have been uploaded to this folder so far and I am working on the other. The promoter file was in the same format as enhancers since I think we are using the same assay for promoters right? Let me know if there is any problem with the files. If it is small error, you can modify with text reader/excel but save it as windows formatted text file.

acavalla commented 7 years ago

@psomdeb25 i think maybe you forgot to filter the file GMP-CLP_promoters_filtered.csv in the methylation results? the other comparisons have ~200 entries but this file has 28000 ;) @fangwuwang are these files generated from files that aren't in the repo? I've been making txt files of the ones in the dna-meth dir, will commit now

acavalla commented 7 years ago

@fangwuwang @rawnakhoque I can't run homer on the promoters now as i can't install wget in my remote server. i can ask admin to do it in the morning so i can run them then, or i could start working on the html file, see if there is a TF database to pull TFs down from?

psomdeb25 commented 7 years ago

@acavalla I have updated the file. You can have a look at it.

psomdeb25 commented 7 years ago

@fangwuwang Do you think we should be discussing the final stage of our analysis on Friday?

fangwuwang commented 7 years ago

@acavalla @rawnakhoque @psomdeb25 I've done all the text files. As you see, there are two files (low methylation in either cell type) for each comparison because I separated them by the positive and negative differential methylation values, which indicates either higher methylation in HSC compared to MPP (for example) or vice versa. So please run these two files individually for promoter and enhancer regions of each comparison, which means four files for each comparison. I am okay with meeting on Friday to discuss about the plan.

acavalla commented 7 years ago

On Friday I have class on campus until 12.30 and then a meeting at 2pm at the GSC, so would either have to be a quick meeting around 12.30 on campus, or after 3 at the GSC. not ideal i know :(

acavalla commented 7 years ago

@rawnakhoque can you upload some of the html meth files to the repo (maybe into a new dir within DNA-meth) so i can have a look?

acavalla commented 7 years ago

@rawnakhoque I ran homer on one of the promoter txt files - what did you put for the size parameter and other options? i used 200 for the size and -preparse but i got various warnings like "Something is wrong... are you sure you chose the right length for motif finding? i.e. also check your sequence file" and "Illegal division by zero at /projects/acavalla_prj/stat540/homer/bin/findKnownMotifs.pl line 152" and "Use of uninitialized value in numeric gt (>) at /projects/acavalla_prj/stat540/homer/bin/compareMotifs.pl line 1381." help!! mine is tab separated but it's saved as txt, does that make a diff?

rawnakhoque commented 7 years ago

@acavalla I am still working on the enhancer files. I will upload the results once the job is finished. For the size parameter you can use -given instead of a specific number. It worked for me. ....Saving as txt should not be a problem, I also saved it as text.

acavalla commented 7 years ago

@rawnakhoque then i cant get it to work, sorry. i'm not getting an html output file because it says no sequences are found. I'll keep trying and i'll let you know if i get anywhere but i'm not optimistic :(

rawnakhoque commented 7 years ago

@fangwuwang Did you read this paragraph titled 'Finding Instance of Specific Motifs' in http://homer.ucsd.edu/homer/ngs/peakMotifs.html. What do you think? Do we need this analysis? N.B. My remote server account will be expired by tomorrow. So I have to complete all the jobs by tomorrow.

fangwuwang commented 7 years ago

@rawnakhoque @acavalla No we don't need the location information for the scope of this project. I asked for the for loop command which is as below, hope it can make it more efficient for you to run ( #is comment).

Change directory:

cd "/Users/....."

create a list of files using regex and wildcards

FILES="folder/folder/xyz_[ab]*_interesting_regions.bed"

For loop

for f in $FILES do echo "BASH: Processing $f file..."

create an output folder name based on the $f name

OUT=$(echo $f | sed -e 's/.bed$/__motif/')

Run Homer (add the other options you need)

findMotifsGenome.pl $f hg19 $OUT -p 6

echo "BASH: saved results in $OUT file..."

done

rawnakhoque commented 7 years ago

@fangwuwang Thanks for the code. But I submitted the job separately in 4 computers in the server. Five jobs finished and five running. Each job taking 1.5 hours.

rawnakhoque commented 7 years ago

@acavalla Good that it's running. Did you complete any run? How long did it take?

fangwuwang commented 7 years ago

@rawnakhoque Thanks for the high efficiency! Can you update the command you used for installation, setting up and analysis in the repo so that our members can refer to it.

rawnakhoque commented 7 years ago

At first I downloaded the configureHomer.pl script from http://homer.ucsd.edu/homer/introduction/install.html For installation I used the following command: perl /Users/chucknorris/homer/configureHomer.pl -install added the path to bash profile PATH=$PATH:/Users/chucknorris/homer/bin/ Then checked the list using, perl /path-to-homer/configureHomer.pl -list Then downloaded the hg19 version of human genome perl /path-to-homer/configureHomer.pl -install hg19 Then run the job using: findMotifsGenome.pl HSC-MPP_enhancer_lowmethyl_MPP.txt hg19 HSC-MPP_enhancer_lowmethyl_MPP/ -size given -mask

fangwuwang commented 7 years ago

@psomdeb25 Somdeb, are you available today to discuss about the interpretation of methylation results and what results to upload to github and put into the poster? I am on campus all day. @rawnakhoque @acavalla you can join if you are done with the TFBS analyses. Thanks!

psomdeb25 commented 7 years ago

Yes. I am free today.

On Mar 31, 2017, at 9:14 AM, fangwuwang notifications@github.com<mailto:notifications@github.com> wrote:

@psomdeb25https://github.com/psomdeb25 Somdeb, are you available today to discuss about the interpretation of methylation results and what results to upload to github and put into the poster? I am on campus all day. @rawnakhoquehttps://github.com/rawnakhoque @acavallahttps://github.com/acavalla you can join if you are done with the TFBS analyses. Thanks!

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/STAT540-UBC/team_Bloodies/issues/15#issuecomment-290757786, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AXxFM0aLIWJqRRWyf-mZ6ob-lS66fJN7ks5rrSZ6gaJpZM4Mtgpl.

rawnakhoque commented 7 years ago

@fangwuwang @acavalla @psomdeb25 I have the results for TF binding motif analysis from enhancer region. Please follow the link for the html version.

rawnakhoque commented 7 years ago

@acavalla Have you been able to complete the analysis for promoter region. If you are still having any problem I can do the analysis for the promoter region as well. Please let me know ASAP.

acavalla commented 7 years ago

Hi! I completed one run but i didn't use mask. I'll start over because it's probably better to have the same conditions. It all works now so it'll take 9h, I can do some over the weekend too

fangwuwang commented 7 years ago

@rawnakhoque @acavalla Sounds good. It might be necessary to keep the parameters the same across enhancer/promoter analyses. After you finish and upload the TF data, I will try to do the clustering analysis on normal and leukemia RNA-seq data. It might be better to split the job so that we get the data earlier.

rawnakhoque commented 7 years ago

@acavalla Can you mention the files you will be working on so that I can work on the others.

acavalla commented 7 years ago

I'm running them all in the for loop, so they'll run overnight and finish when they finish. I can upload them to the github tomorrow. I've got one set already, so I'll upload that now.

rawnakhoque commented 7 years ago

ok. I am running for CMP-MLP-CMP, CMP-MLP-MLP and GMP-CLP-CLP. may be we can compare our results for these three.

On Fri, Mar 31, 2017 at 4:43 PM, acavalla notifications@github.com wrote:

I'm running them all in the for loop, so they'll run overnight and finish when they finish. I can upload them to the github tomorrow. I've got one set already, so I'll upload that now.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/STAT540-UBC/team_Bloodies/issues/15#issuecomment-290867098, or mute the thread https://github.com/notifications/unsubscribe-auth/AX06pBuOpfoWxlbonBa48rNo_4rjV1B9ks5rrY-PgaJpZM4Mtgpl .

acavalla commented 7 years ago

I ran as -size given, -mask and -preparsed (I'm not sure what that one means but it complained when i didn't use it). I've uploaded the known motifs for CMP-MLP-CMP here, so download it and have a look

rawnakhoque commented 7 years ago

@acavalla Thanks for putting your effort on running this. Here you mentioned the known motif results file. But are not we interested in the de novo motif results file? I checked my known motif results file with your as well and they are different. You have got 14 motifs while I got 26 motifs. May I know how many motifs you got in your de novo motif file? I have got 49. Also your file naming is confusing because you named it as CMP-MLP_output. From which it is not clear what is your target progenitor. Is it CMP-MLP-CMP or CMP-MLP-MLP? Please keep the name as it is in the original text file. It's more readable by other people.

rawnakhoque commented 7 years ago

@fangwuwang Could you please post some update on your analysis and also if you would like me to do some. If you think so you can email me for the detail.

fangwuwang commented 7 years ago

Thanks @rawnakhoque, I want to inspect the RNA expression of the transcription factors in the normal cell data, is it possible to get the gene symbols (shown as hgnc_symbol in your converted list) for the raw data (all transcripts) ?

rawnakhoque commented 7 years ago

@fangwuwang Please find the files here. I split the file into (raw_genes_1, 2, and 3) since the program got stuck due to the big file. I uploaded the code as well.

fangwuwang commented 7 years ago

Thanks @rawnakhoque, looks great.

fangwuwang commented 7 years ago

@rawnakhoque Can you please look at the clustering analysis as well (refer to this seminar)? Sorry I am working on the expression of TFs and introduction of the poster and may not be able to dedicate to it. Have you done any promoter analysis of the TFBS since I am not sure where Annie is at her analysis. Thank you.

fangwuwang commented 7 years ago

@rawnakhoque You mentioned you have done both known and de novo motif finding, but in the folder, only known motif results are there. Can you upload de novo results as well? I found this Homer page provides great details about the analysis mechanisms and output explanation. We can see that the (13. de novo output) is different from (14. known motif output) in terms of the layout of html page. @acavalla @psomdeb25

rawnakhoque commented 7 years ago

@fangwuwang Sorry, my bad. Now uploaded the de novo results file as well.

rawnakhoque commented 7 years ago

@fangwuwang @acavalla I am running homer for rest of the promoter groups.

acavalla commented 7 years ago

I'm going in to work later so I can check where the analysis is at then. It should be finished and then I'll upload it all. The one that's up already is CMP MLP CMP, sorry for the confusion

fangwuwang commented 7 years ago

@rawnakhoque Also, for the gene id conversion of the RNA-seq file, can you tell me how you separated the raw data into three parts (row [x to y] converted to gene list 1, row [y to z] converted to gene list 2, and so forth)? Since there are some missing rows compared to original data when you add up the total number of three gene list files, which create a big trouble to the matching to the original file. Thanks.

fangwuwang commented 7 years ago

@acavalla Can you let us know what has been done so Rawnak don't need to run again? And I don't know what the advantage of the de novo finding is, the results are quite different from the known finding. Just thought maybe we can pool the two results together for the rest of the analysis like expression level and clustering. @rawnakhoque

rawnakhoque commented 7 years ago

@fangwuwang This is not due to the missing rows. The input was correct but the program did not find gene ids for some of the transcripts so the row number reduced. You can see the files for the transcript id here

acavalla commented 7 years ago

I ran the for loop in the shell for all the promoters, so they should all be done. I don't think we should present all the found motifs either - we're not trying to find sequences, just link them to TFs I thought?