Rootana production 2 - Githubissues

benkrikler commented 10 years ago

This is just to get the ball rolling with production 2. I believe the code is all there now for part 1 of elog:152 but still needs refinement in terms of plot scales and axes and some algorithm parameters.

If part 2 is going to take some time I will try to write a new AP generator that combines the 3 simple existing ones (max bin, CFT and Simple integral) so we have TAPs with all fields filled.

benkrikler commented 10 years ago

Where have we got to with all this then? I'm planning on running over Al50b tonight. I think we can only use the run scripts if we've tagged the code, so what's realistic to be ready by tonight (ie. about 7 hours time for me)?

AndrewEdmonds11 commented 10 years ago

So we have the TAP generator with all fields filled. I'm having some problem with adding the PulseCandidateFinder (the channel names are going a bit screwy so I can't get the parameter values for start/stop conditions) so it might not be done (not helped by the fact that I have to leave early and that my laptop has broken so can't do any work at home) but this isn't a critical problem and we can run without it.

benkrikler commented 10 years ago

Is this another problem with the Channel IDs (please god, no..)? Do you want me to look into something and is this something I could try helping out with if you don't get it done in time?

AndrewEdmonds11 commented 10 years ago

No worries, just solved it. I was creating the PCF in the constructor of the generator and was passing the channel. But the channel doesn't exist until the ProcessPulses function, so it's all OK now.

I may still need to make some tweaks to make sure that the start/stop conditions are good and give us the whole pulse for analysis, though

benkrikler commented 10 years ago

Oh, sweet! If we run tonight and uncover bugs, then hopefully there's still time tomorrow to run again and get results that can be discussed and shown on Wednesday. Then there's two more days to repeat once more before the Friday meeting.

benkrikler commented 10 years ago

On my side, TDPs seem to be working and I'm able to plot the correlated amplitudes, timing difference and plot amplitudes of one channel based on a cut on the other. Minor tweaks to the output are still going in, and I might break the code up into 2 separate modules, but a first iteration of the code for Task 1 is there then. A possible extension is to look at the integrals as well as the amplitudes; is this worth doing at this point?

I'm also adding a PlotIntegrals module which just does 1D histograms of all Integrals the same as PlotAmplitudes.

benkrikler commented 10 years ago

The PlotIntegrals module is up and the TDP plots for task 1.c and 1.d is in place.

benkrikler commented 10 years ago

Nearly everything we need is there now, so I've started testing actual production configs.

Sadly, It looks like I'm going to have to take the pulse finder out of the MegaTAPs for now since it slows a single event run down from 0.5 seconds to 27. At half a minute per event it would take 12 hours or so to do a single run, so it's not realistic for Wednesday. Depending how much we can optimize it before Friday, we may or may not be able to use it.

@AndrewEdmonds11 or @litchfld, would you be happy to start testing a production? It's 3am for me so I should head to bed. I started preparing things on the new branch feature/BK_prepare_production_2. The production.cfg modules file has been moved into the configurations directory and updated with everything I would like to run ideally (except Damien's TDiff modules which we'll add once they're ready). Depending on memory consumption and time we may need to break the runs into smaller steps so we'd need a single run script and 2 or more modules files for each step.

AndrewEdmonds11 commented 10 years ago

Yes, I can start testing production. I'll update this issue with what I find

AndrewEdmonds11 commented 10 years ago

I think the massive slow down with the PCF is because it is instantiated in the constructor, where the channel is set to "*", which means the parameter value is 0 and so we get loads O(10) sub pulses.

I've (very) poorly explained this before here:

No worries, just solved it. I was creating the PCF in the constructor of the generator and was passing the channel. But the channel doesn't exist until the ProcessPulses function, so it's all OK now.

and here:

Yes, make any changes you like (especially where I've been an idiot). For the initializations in the constructor when I had the PCF there I was getting "*" for the channel string.

Anyway, I know that it might be worse to keep on creating a PCF every TPI but otherwise it doesn't work. Unless I find a way to have the channel correctly defined in the constructor...

AndrewEdmonds11 commented 10 years ago

So it makes no real difference in terms of how long it takes whether we have the PCF created in the constructor (and we are looping through many more "pulses") or not (and we have to create the same PCF each time. I think the main inefficiency comes with this line: fParameterValue = opts->GetInt(detname, fDefaultParameterValues[fChannel]) since the fDefaultParameterValues and fOneSigmaValues maps only get filled once in total).

I'll try and find a way to have the correct detector name available in the generator constructor. Hopefully, I don't break anything...

AndrewEdmonds11 commented 10 years ago

So the problem comes in MakeAnalysedPulses::AddGenerator:

// Get the requested generator
    TVAnalysedPulseGenerator* generator=
        TAPGeneratorFactory::Instance()->createModule(generatorType,opts);
    if(!generator) return false;
    generator->SetChannel(detector);

we create the generator and then we set the channel after it's been created. I don't want to play with the createModule() function so, for the time being, I'll try and overload the SetChannel function to also create the PCF

AndrewEdmonds11 commented 10 years ago

I'll try and overload the SetChannel function to also create the PCF

Can't do this because fSource is private. This is probably for good reasons so I'll try and find another way. Instead of spamming people I'll just let you know when I solve it

AndrewEdmonds11 commented 10 years ago

So I left this issue for a bit and tested the new production config file. A couple of small fixes:

corrected the name of PlotIntegral in production.cfg
uncommented the calculation of integral from the TAP generator

benkrikler commented 10 years ago

Good morning! So where have we got to?

I'll try and overload the SetChannel function to also create the PCF

Can't do this because fSource is private.

That sounds like the right way to go. You shouldn't need to touch fSource though, we just need to make SetChannel virtual then you call TVAnalysedPulse::SetChannel from the derived class' version. I can quickly put this in now, but I don't want us to lose more time with this as this point.

AndrewEdmonds11 commented 10 years ago

So, I ended up doing the following, which should mean we create a PCF once for each detector but it hasn't sped up the program:

In TVAnalysedPulse:

@@ -7,6 +7,7 @@
 #include "definitions.h"
 #include "TAPGeneratorOptions.h"
 #include "TAnalysedPulse.h"
+#include "PulseCandidateFinder.h"

 class TPulseIsland;

@@ -43,6 +44,11 @@ class TVAnalysedPulseGenerator {
         /// \todo Document this
         virtual bool MayDivideTPIs()=0;

+       /// \brief
+       /// This method is overloaded by a generator if it MayDivideTPIs
+       void SetPulseCandidateFinder(PulseCandidateFinder* pcf) { fPulseCandidateFinder = pcf; }
+       PulseCandidateFinder* GetPulseCandidateFinder() { return fPulseCandidateFinder; }
+
         void SetPulseList(PulseIslandList* list){fPulseList=list;};
         /// \brief
         /// The suggested method for constructing a new TAP.
@@ -114,6 +120,10 @@ class TVAnalysedPulseGenerator {
         /// \brief
         /// The vector of TPIs being processed.
         PulseIslandList* fPulseList;
+
+       /// \brief
+       /// The pulse candidate finder that the generator may or may not use
+       PulseCandidateFinder* fPulseCandidateFinder;
 };

In MakeAnalysedPulses.cpp:

@@ -207,7 +207,9 @@ bool MakeAnalysedPulses::AddGenerator(const string& detector,string generatorTyp
         TAPGeneratorFactory::Instance()->createModule(generatorType,opts);
     if(!generator) return false;
     generator->SetChannel(detector);
-
+    if (generator->MayDivideTPIs()) {
+      generator->SetPulseCandidateFinder(new PulseCandidateFinder(detector, opts));
+    }

and I made the appropriate changes in FirstCompleteAPGenerator.

As I said, it's made no real difference, so I'm currently profiling the program (gprof didn't work for some reason so now trying valgrind).

Not sure how you want to play this. I think it would be nice to do a production without the PCF first so we can see how it changes things later. If we run with the PCF for tomorrow, we should just do a handful of runs (we can do 20 in parallel on Merlin, right?) and then possibly run over all of them for a couple of days and save the TAPs so we save time later.

AndrewEdmonds11 commented 10 years ago

Ok, I think I've found it. It's because in FindCandidatePulses_Slow and FindCandidatePulses_Fast, I do this:

  double pedestal = SetupNavigator::Instance()->GetPedestal(bankname);
  double noise = SetupNavigator::Instance()->GetPedestalError(bankname);

which goes through the SQLite file. We already have the pedestal noises in fOneSigmaValues and so I'll just use those and then do something similar for the pedestals so we don't have to keep retrieving them each time.

BTW, the commands I ran to get a nice GUI for the profile:

> valgrind --tool=callgrind ./rootana -i ......
> kcachegrind callgrind.out.X

benkrikler commented 10 years ago

Ah, sorry Andy, I was maybe a bit hasty, cause I just added the code I was suggesting above and pushed it.

I don't think however that MakeAnalysedPulses should be setting things on the generator like this. It breaks the way we separate out the algorithms I think. In any case, what you suggest is a bit of work-around which should no longer be needed.

Not sure how you want to play this. I think it would be nice to do a production without the PCF first so we can see how it changes things later.

How about this then, I'll focus on getting a production going using just MaxBin which will still be able to profile and check the plotters and TDP creation since it fills time and amplitude. In the mean-time, you can try and speed up the PulseFinder?

benkrikler commented 10 years ago

Must of hit post at the same time there! That definitely sounds like a problem. Can you move it into the constructor and then test?

I was using callgrind_annotate last night, but it just gives you text output, so it's a bit trickier to read maybe

AndrewEdmonds11 commented 10 years ago

So I replaced this:

double noise = SetupNavigator::Instance()->GetPedestalError(bankname);

with this:

double noise = fOneSigmaValues[IDs::channel(fChannel)];

in both the fast and slow algorithms.

This meant 5 events took 2 minutes rather than 4. Also, if I store the noise and pedestal as member variables and retrieve them once like you suggest, then it will be much quicker but we will see...

benkrikler commented 10 years ago

Avoid doing any sort of find algorithm in the main processing functions. That includes operator[] on a map. Since we know the channel from the constructor, we should initialise everything immediately. If that uses a map or something else that's fine, but the look-up of the value in that table should not need repeating each time we're asked to process a TPI.

benkrikler commented 10 years ago

I've just run the production module file but using MaxBin as the only generator. On run 3329 which has 1409 entries, it took just under 3 minutes. Do you reckon we can squeeze the MegaTAP generator into 10 times that? The output file size is about 91MB. I'll add in the other old generators to compare this to the last run.

AndrewEdmonds11 commented 10 years ago

Ok, with all my changes. Run 2808 (2609 entries) took 8 minutes :D

So, this looks like problem solved, right? Also, I've been doing this work on your BK_prepare_production_2 branch, Ben, so shall I commit and push my changes now before you do anything else?

benkrikler commented 10 years ago

Great! I've been burning up resources using 3 generators for each channel (It makes an awful lot of plots), so having a functioning one which does it all is really good.

I would commit and push, but first make sure to pull, and if you're ok with it, could you remove the changes you described above to MakeAnalysedPulses and TVAnalysedPulse? Or you can just commit and push and I'll dig around after.

We can then tag this and try running this beast...

AndrewEdmonds11 commented 10 years ago

could you remove the changes you described above to MakeAnalysedPulses and TVAnalysedPulse?

I was just doing this and implementing the changes you described. Here is what I have at the moment for the overloaded SetChannel function:

virtual void SetChannel(const std::string& det){ TVAnalysedPulseGenerator::SetChannel(det); fPulseCandidateFinder =  new PulseCandidateFinder(GetChannel.str(), fOpts); };

and the compiler is complaining that the TVAnalysedPulseGenerator::SetChannel is also private. MakeAnalysedPulses got around it because it is a friend class. any ideas here?

benkrikler commented 10 years ago

You can just pull this branch cause i added the changes and pushed it all.

AndrewEdmonds11 commented 10 years ago

Ok, so I committed my changes to PulseCandidateFinder and FirstCompleteAPGenerator and then pulled in your changes (needed to resolve a really simple conflict) and now everything is pushed.

However, it is now not compiling with the error:

Making object: 'obj/FirstCompleteAPGenerator.o'
In file included from src/TAP_generators/FirstCompleteAPGenerator.cpp:2:0:
src/TAP_generators/FirstCompleteAPGenerator.h:38:17: error: ‘virtual void FirstCompleteAPGenerator::SetChannel(const string&)’ cannot be overloaded
In file included from src/TAP_generators/FirstCompleteAPGenerator.cpp:2:0:
src/TAP_generators/FirstCompleteAPGenerator.h:23:17: error: with ‘virtual void FirstCompleteAPGenerator::SetChannel(const string&)’
In file included from src/TAP_generators/FirstCompleteAPGenerator.cpp:2:0:
src/TAP_generators/FirstCompleteAPGenerator.h: In member function ‘virtual void FirstCompleteAPGenerator::SetChannel(const string&)’:
src/TAP_generators/FirstCompleteAPGenerator.h:38:156: error: ‘((FirstCompleteAPGenerator*)this)->TVAnalysedPulseGenerator::GetChannel’ does not have class type
make: *** [obj/FirstCompleteAPGenerator.o] Error 1

which doesn't make any sense to me.

Also, on a aide note, we'll need to do a production run of PlotPedestalAndNoise to get the SQLite database for all the runs we'll do.

AndrewEdmonds11 commented 10 years ago

Solved it. There were two versions of the same function floating around after a merge.

benkrikler commented 10 years ago

Cool! So we're good to go. You've done this before with alcapana right? We need to merge this feature branch into develop, then take that into master then tag it all right?

AndrewEdmonds11 commented 10 years ago

Well, the git flow way is to do a release branch, which is kind of what we've done already. I'll do it just as soon as I check rootana runs correctly

Also, how are we going to generate the SQLite file?

benkrikler commented 10 years ago

Oh bugger, you mean this should have been a release not a feature? Can we make the release branch then use native git to merge the feature branch into that then use git flow to finish it all?

Do we need an sqlite file for each run or can we reuse one for a single dataset? I guess it depends how stable the DAQ was so one per run makes more sense? This is a one off process so future productions won't need to run this unless we change the module correct?

@jrquirk, can you tell us how to set up your production script? Is it able to call some generic executable / bash script or does it directly run rootana with a given configuration?

AndrewEdmonds11 commented 10 years ago

Oh bugger, you mean this should have been a release not a feature? Can we make the release branch then use native git to merge the feature branch into that then use git flow to finish it all?

I was just going to create a release branch and then finish it straight away. I'll make a not of the commands and add them to the wiki after I've finished.

Do we need an sqlite file for each run or can we reuse one for a single dataset? I guess it depends how stable the DAQ was so one per run makes more sense? This is a one off process so future productions won't need to run this unless we change the module correct?

The way SetupNavigator is coded right now is that it also has to match the run number so, for the time being, we will need one per run.

We could either have two config files in this tag (one with just PlotPedestalAndNoise) and do those first (shouldn't take too long) and then run with the proper config.

benkrikler commented 10 years ago

We could either have two config files in this tag (one with just PlotPedestalAndNoise) and do those first (shouldn't take too long) and then run with the proper config.

I think that's the way we'll have to go.

I would like to stop the SetupNavigator from just crashing though when it's missing the input sqlite file. If we miss a run in the production or miss-name files things should throw out meaningful errors and then stop, but at the moment it just seg-faults.

AndrewEdmonds11 commented 10 years ago

Ok, I've updated the master branch now.

One problem, I've just thought of, is that the SQLite database is created local to the user and so if we are all running PlotPedestalAndNoise on different runs we'll all end up with different versions of the table and there's no guarantee we would run the same runs on the second go-through.

Shall I just run over all the Al50(b) runs with PlotPedestalAndNoise first before we do the production run? @jrquirk, is there a way to only select certain runs to be processed in your jobscripts?

benkrikler commented 10 years ago

John's wiki on the production scripts is fairly complete. Correct me if I'm wrong John, but there's not an option to run on just one dataset? Also, I can't see where we set the task to run for this production, but given you store the modules file in the database, am I right that you call rootana directly and then supply the modules file to it?

AndrewEdmonds11 commented 10 years ago

Ok, so I guess John is otherwise engaged. I've tried running on the Merlin login cluster for one run with PlotPedestalAndNoise only and it took two minutes.

I can set up a script to run over just the Al50b runs (there are 157 of them) which would take about 5 and a half hours. What time would it be in US/Brazil after this? I could start running now and then let you guys copy the SQLite file and do the actual production yourselves. It's a very roundabout way of doing it but if we want some results by tomorrow then this is probably the best bet.

benkrikler commented 10 years ago

5 hours from now is about 5pm for me.

Why don't we divide up the runs and just process 10 or 20 in their entirety then go for the rest if there's time?

AndrewEdmonds11 commented 10 years ago

So we wouldn't be doing this on the Merlin batch system for the time being then. Is that what you mean?

jrquirk commented 10 years ago

If you want to use the jobscripts, you can start a new production with the --new flag with something like

$ ./run_production.py --production=rootana --new=rootana_v19 --version=3

And then everyone else can just run

$ ./run_production.py --production=rootana

and they'll latch on to your rootana production version. In the above, the --version=3 option means "base this new production off of alcapana version 3," the rootana version will be automatically incremented from the last rootana production version and will be printed out in all cases.

Or you can write your own script. I've just been using short bash scripts for my work recently just out of habit.

jrquirk commented 10 years ago

And there's no way as of yet to specify a dataset.

AndrewEdmonds11 commented 10 years ago

Ok, I've just found (and fixed) a bug in PlotPedestalAndNoise where extra runs were not getting lines added to the SQLite table.

So shall we just run about 10 runs each tonight (I've got about half of the script we would need), we look at the plots tomorrow in the meeting (presumably we can just hadd the files?) and then we do the full production for Friday. How does that sound?

benkrikler commented 10 years ago

If we can run on the batch system that's better, but if we don't do many runs then using the login shell is probably fine.

@jrquirk, where in all this do we specify what get's run? I found the batch_rootana.sge script in analyzer/batch. Is this what is sent to the batch system then? If we want to do two different passes of a single run using 2 different modules files then how would we do it using your scripts? Or are you suggesting we just write small one-off scripts to do this?

If we're each able to do 10 runs tonight that would be good at least to give us a check of what's working and to do an initial inspection of the results at tomorrow's meeting.

AndrewEdmonds11 commented 10 years ago

Ok. I've merged my hotfix into master now. Here's a script I use to run several files with just the PlotPedestalAndNoise module on:

#!/bin/bash                                                                                                                                                                                          
# generate_pedestal_and_noise.sh                                                                                                                                                                     
# -- runs rootana over a set of runs with just the PlotPedestalAndNoise module on so that                                                                                                            
#    the peedestals-and-noises.sqlite file can be created                                                                                                                                            

RUN_NUMBERS=( 3{563..564} )

for run in ${RUN_NUMBERS[@]} ; do
    filename="/gpfs/home/edmonds_a/data/tree/tree0$run.root"
    outfilename="out$run.root"
    if [ -e $filename ] ; then
        echo "$filename exists"
        ./rootana -m andyModule.txt -i $filename -o $outfilename
    fi
done

where andyModule.txt is:

#debug=all
list_modules
dump_contents

[ MODULES ]
plot_pedestal_and_noise = PlotPedestalAndNoise
## Get / save pulses
# load_pulses = LoadPulses
#analyse_pulses = MakeAnalysedPulses
#detector_pulses = MakeDetectorPulses
#SavePulses

## Plot TAPS
#PlotTime
#PlotAmplitude
#PlotIntegral

## Plot TDPs
#PlotTDPs

## Data Quality
#dq_amp = IslandAmplitude
#dq_len = IslandLength

[ analyse_pulses ]
#analyse_channels=all
#default_fast_generator=FirstComplete
#default_slow_generator=FirstComplete

[ detector_pulses]
#time_difference=1

[ load_pulses ]
#file_name=input.root

[ plot_pedestal_and_noise ]
export_sql=1

# vim: set syn=cfg:

I can extend the bash script to also subsequently run with the production.cfg script so that we can do a small number of runs now, if you like.

benkrikler commented 10 years ago

I had a very similar looking script which should work with the batch system in a parametric job:

#! /bin/bash

run="$GE_TASK_ID"
InputDir="$HOME/data/tree"
OutputDir="$HOME/data/output"

InFile="$InputDir/tree0$run.root"
OutFile="$InputDir/out$run.root"
ModFile="$1"

RootanaCmd="$HOME/AlcapDAQ/analyzer/rootana/rootana"

# setup env
. $HOME/AlcapDAQ/thisdaq.sh

# run rootana
$RootanaCmd -m "$ModFile" -i "$InFile" -o "$OutFile"

And submit it like:

qsub -q short.q -t 3563:3574 ./run-batch.sh -- production.cfg

AndrewEdmonds11 commented 10 years ago

Ok, I've generated the sqlite file for the first 10 events (do you want more?) and then I'll leave you guys to do the proper run. I don't think it will take long to run but, if it does, I can collate the results tomorrow morning (my time).

AndrewEdmonds11 commented 10 years ago

The sqlite file is in ~edmonds_a/AlcapDAQ/analyzer/rootana/ so feel free to link against it. If you want more runs then the script is up there (a quick question: if you ran it, would it just add lines to the linked database?). Anyway, I'm off home now so let me know if anything needs doing tomorrow before the meeting.

benkrikler commented 10 years ago

Have you set the access permissions for us to be able to write to that file?

benkrikler commented 10 years ago

I've been able to do all of the runs fr Al50b quite quickly with the batch system, but sqlite locks the database so the values cant be written to it if we run in parallel...

jrquirk commented 10 years ago

The pedestal/noise database?

benkrikler commented 10 years ago

The pedestal/noise database?

Yes. I'm looking through the root documentation to see if there's anything we can do, but I've no experience with this so if you do and could help me out that would be great. In the mean time I'll just start running in serial...

alcap-org / AlcapDAQ

Rootana production 2 #166