Closed acviana closed 10 years ago
Currently running build_master_images_table.py
on mtpipeline_dev
. According to your notes, it will take at least 1 hour and 20 minutes to run all of the 42496
images.
I think it will take more than the estimated time to run.
It seems to only be ~1/2 way through after ~90 min:
$ grep -c Processing build_master_images_table_2014-07-15-16-30.log
24855
$ tail -1 build_master_images_table_2014-07-15-16-30.log 07/15/2014 18:00:25 PM MainProcess INFO: Processing /astro/mtpipeline/mtpipeline_outputs/wfpc2/07429_uranus/png/u43h0105m_cr_c0m_wide_single_sci_linear.png
Take a look at you memory usage. If it's filling up then maybe I should commit the session every few thousand records.
It's almost filling up.
Are we hitting the swap?
It's using 387.8 MB from the swap.
Plot the frequency of the word Processing
as a function of time in the log file. I want to see if there is a discontinuity part-way through that would indicate hitting the swap:
grep Processing build_master_images_table_2014-07-15-16-30.log
Here's the plotting:
So, interpret the plot for me. What do you see? What, if anything, does this suggest for our code?
After 40 minutes running the script, something happened slowing the process by more minutes, maybe it was because it hit the memory limit.
I'm seeing something more complex than that. It's a little hard to tell because you are plotting time as a function of number of ingested records and not the other way around. Also, you are plotting the total number and not the rate of ingestion. But, you can infer both of these things.
Try holding a pen or a ruler up to the line on the plot. You'll notice that the rate is constant, the slows down, then speeds back up so that at the end it's ingesting at almost the same speed as the beginning.
I would like to hear a little analysis from you as to what this means, not just a description of the plot. Do you think this is indicative of hitting the swap? Do you think this is something we need to address now before we ingest the WFC3 and ACS datasets? How bad is the problem?
Using a ruler, I noticed that it started slowing down after 23 minutes, even before it had processed 10 thousand files. Then, as you said, after one hour or so it started being constant. I can't tell if that's an indicative of hitting the swap, for this I would have to tracked the memory usage in all stages of the process. It took around 20 seconds to run each file in average (same as yours when you thought the time was wrong in your notes), I didn't calculate the std though.
We should take a look into this before going through ACS and WFC3, I don't know exactly what to do though. If we start working on ACS and WFC3 now we may take a lot more time to process all the files than if this issue was fixed, if it is a solution for this of course.
I am running build_master_images_table.py
again and it's taking ~0.12s to process each file.
:thumbsup:
If this has been completely reloaded and a database dump has been created we can close this.
Based on the information you provided in #114 I've decided it's faster to build a new database.
Once you have completed the script in #121 modify your
settings.yaml
file to connect to a new database on your local host called something likemtpipeline-dev
. This will be a development database that will eventually be used to replace the database dump I sent you. Use your database_reset.py script to create all the tables.Finally, run
build_master_images_table.py
on the new database. This should take a few hours to run. You can estimate the run time using my notes in #103 and see how the actual run time compares.