CYGNUS-RD / reconstruction

Camera and scope analysis tools
0 stars 25 forks source link

go back to the setPedestal original method #226

Closed emanueledimarco closed 11 months ago

emanueledimarco commented 1 year ago

...which I fixed counting from the back in the manual parsing. I don't know if the auto CSV is cleaned for this kind of possible problem.

@antorita @igorabritta this was introduced in PR #216 : the pedestal runs are wrong if the field "run_description" has a ",", which is likely the case, it seems (see " Daily Calibration, parking" for Run3, or "S000:PED:BKG, no pmt" for Run1/Run2. I manually avoided this by counting from the bottom, but with this code this is not done, I think.

@igorabritta do you clean the downloaded CSV to avoid these cases? Otherwise I suggest to go back to the old pedestal setting.

Let's not use commas in the run_description in the future

@igorabritta @antorita @davidepinci if confirmed the problem was there, let's see since when we should re-reco the runs with the fix (I can do it)

Nota bene: in the original method the csv used is "pedestals/runlog_LNGS.csv", in the v2 is "pedestals/runlog_auto.csv". Let's use the same name (otherwise I think one has to change either in the automated reco or in the LNGS batch reco for no reason).

emanueledimarco commented 1 year ago

I'll wait to merge this PR until confirmed that there is no harm for the automated reconstruction (apart changing the name of the CSV file)

igorabritta commented 1 year ago

Hi Emanuele,

We chose to have two 'tables' and two 'methods' to find the pedestals at the beginning to test the compatibility of our changes.

But, in case you would like to use the "setPedestalRun_v2" function also for the old runs (before 16798), we have only to amend a line to the code.

This function does not have problems with commas because it opens the CSV with Pandas and it is able to look at column names, so we do not need to count backward. And also because of this, it did not have to clean the table before saving it.

Another point that maybe is worth mentioning here is that we could add the following lines of code to our code to create the CSV table "online". So it will be always up-to-date and it will be less data (the table is almost 100mb) to be passed to the farm/grid/condor.

def createPedLog():
    #get the most updated table and create the runlog table
    df = cy.read_cygno_logbook(verbose=False)
    df.to_csv('../reconstruction/pedestals/runlog_LNGS_auto.csv',index=False)

What do you think?

NB1: I agree, let's use the same name, it was different just while testing the compatibility.

NB2: Hope I was clear, not sure that my brain is working properly haha

emanueledimarco commented 1 year ago

This function does not have problems with commas because it opens the CSV with Pandas and it is able to look at column names, so we do not need to count backward. And also because of this, it did not have to clean the table before saving it.

but if the input file is a CSV, even if it is the panda method to read it, and the separator of the CSV is the "," symbol, then the association [column - header name] is only granted by the sorting. If there is an extra "," in one of the column values, then I think all the values for the next columns are shifted by +1.

Another point that maybe is worth mentioning here is that we could add the following lines of code to our code to create the CSV table "online". So it will be always up-to-date and it will be less data (the table is almost 100mb) to be passed to the farm/grid/condor.

I think that one can do:

  1. a persistent full pedestal log (that is 1/10 of all the data. If this becomes too large we can split it by eras - i.e. run-ranges / run1 / run2/ run3)
  2. an online check that the run being reconstructed is in the runlog
igorabritta commented 1 year ago

but if the input file is a CSV, even if it is the panda method to read it, and the separator of the CSV is the "," symbol, then the association [column - header name] is only granted by the sorting. If there is an extra "," in one of the column values, then I think all the values for the next columns are shifted by +1.

I was in doubt after your comment and did a check. I do not know if it is because of the way pandas save or load the data (maybe both), but it is able to understand that the commas in the description are part of the same column. This is the screenshot of the table after saving it into a CSV file:

image

and this is after loading it:

image

So, I think it is safe to use it, in case.

I think that one can do:

a persistent full pedestal log (that is 1/10 of all the data. If this becomes too large we can split it by eras - i.e. run-ranges / run1 / run2/ run3) an online check that the run being reconstructed is in the runlog

First, let me correct myself, the log tables are only 4Mb, so forget what I said before, about the size.

  1. I think I did not understand what you meant by persistent full ped log.
  2. Agree!

Let me know

emanueledimarco commented 1 year ago

I was in doubt after your comment and did a check. I do not know if it is because of the way pandas save or load the data (maybe both), but it is able to understand that the commas in the description are part of the same column. This is the screenshot of the table after saving it into a CSV file:

I am not yet convinced that this is what happens in our code, or if the "magic" of the correct splitting happens in the visualisation. Can you do a test by running on one of these runs with the v2 method ?

  1. I think I did not understand what you meant by persistent full ped log.

The run log is used only to: a. to find the pedestal run <= the analyzed run. So a csv with only pedestal runs would be fine, and since we take ~1 ped run / 10 runs, the size of this CSV would be ~1/10. But if the full log is only 4 Mb I don't see any problem for the time being in remaining with the current setup. b. in case we just make the csv with only pedestal runs, then we can just make the check if the run was taken on the fly.

GioDho commented 11 months ago

One of the two modifications of this commit has been added via another pull request. The use of pandas has been tested more and in case , pose a threat to the integrity of the code, another special character can be used to generate and read the .csv

GioDho commented 11 months ago

Given the last comment I'll close the PR.