Closed acviana closed 10 years ago
Here are the command and output to list the tables:
mysql> show tables;
+----------------------+
| Tables_in_test_wally |
+----------------------+
| finders |
| master_finders |
| master_images |
| sub_images |
+----------------------+
4 rows in set (0.00 sec)
Great. Now show me the following:
master_images
dr_mode
is something other than 'center' or 'wide'master_images
:mysql> select count(*) from master_images;
+----------+
| count(*) |
+----------+
| 12988 |
+----------+
1 row in set (0.00 sec)
drz_mode
is something other than 'center' or 'wide':mysql> select count(*) from master_images where drz_mode <> "center" and drz_mode <> "wide";
+----------+
| count(*) |
+----------+
| 3096 |
+----------+
1 row in set (0.00 sec)
Good, now show:
master_images
where drz_mode
is wide
master_images
where drz_mode
is wide
and cr_mode
is no_cr
master_images
where drz_mode
is wide
and cr_mode
is cr
Do these numbers look right? Do the same thing for the center
images.
drz_mode
is wide
:mysql> select count(*) from master_images where drz_mode = "wide";
+----------+
| count(*) |
+----------+
| 5443 |
+----------+
1 row in set (0.01 sec)
drz_mode
is wide
and cr_mode
is no_cr
:drz_mode
is wide
and cr_mode
is cr
:mysql> select count(*) from master_images where name like "%_cr_%" and drz_mode = "wide";
+----------+
| count(*) |
+----------+
| 2932 |
+----------+
1 row in set (0.01 sec)
drz_mode
is center
:mysql> select count(*) from master_images where drz_mode = "center";
+----------+
| count(*) |
+----------+
| 4449 |
+----------+
1 row in set (0.01 sec)
drz_mode
is center
and cr_mode
is no_cr
:drz_mode
is center
and cr_mode
is cr
:mysql> select count(*) from master_images where name like "%_cr_%" and drz_mode = "center";
+----------+
| count(*) |
+----------+
| 2158 |
+----------+
1 row in set (0.01 sec)
There's no cr_mode
attribute in master_images
.
Really? Do you see the same thing I see?
> DESCRIBE master_images;
+------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| project_id | int(11) | YES | | NULL | |
| name | varchar(50) | YES | UNI | NULL | |
| fits_file | varchar(50) | YES | UNI | NULL | |
| object_name | varchar(50) | YES | | NULL | |
| set_id | int(11) | YES | | NULL | |
| set_index | int(11) | YES | | NULL | |
| width | int(11) | YES | | NULL | |
| height | int(11) | YES | | NULL | |
| minimum_ra | double | YES | | NULL | |
| minimum_dec | double | YES | | NULL | |
| maximum_ra | double | YES | | NULL | |
| maximum_dec | double | YES | | NULL | |
| pixel_resolution | float | YES | | NULL | |
| priority | int(11) | YES | | NULL | |
| description | varchar(50) | YES | | NULL | |
| file_location | varchar(100) | YES | | NULL | |
| visit | varchar(11) | YES | | NULL | |
| orbit | int(11) | YES | | NULL | |
| drz_mode | varchar(6) | YES | | NULL | |
| cr_mode | varchar(6) | YES | | NULL | |
+------------------+--------------+------+-----+---------+----------------+
21 rows in set (0.00 sec)
mysql> SELECT DISTINCT cr_mode FROM master_images;
+---------+
| cr_mode |
+---------+
| no_cr |
| cr |
+---------+
2 rows in set (0.04 sec)
I see this:
mysql> DESCRIBE master_images;
+------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| project_id | int(11) | YES | | NULL | |
| name | varchar(50) | YES | UNI | NULL | |
| fits_file | varchar(50) | YES | UNI | NULL | |
| object_name | varchar(50) | YES | | NULL | |
| set_id | int(11) | YES | | NULL | |
| set_index | int(11) | YES | | NULL | |
| width | int(11) | YES | | NULL | |
| height | int(11) | YES | | NULL | |
| minimum_ra | double | YES | | NULL | |
| minimum_dec | double | YES | | NULL | |
| maximum_ra | double | YES | | NULL | |
| maximum_dec | double | YES | | NULL | |
| pixel_resolution | float | YES | | NULL | |
| priority | int(11) | YES | | NULL | |
| description | varchar(50) | YES | | NULL | |
| file_location | varchar(100) | YES | | NULL | |
| visit | varchar(11) | YES | | NULL | |
| orbit | int(11) | YES | | NULL | |
| drz_mode | varchar(6) | YES | | NULL | |
+------------------+--------------+------+-----+---------+----------------+
20 rows in set (0.00 sec)
Ok, try loading /astro/mtpipeline/database_dumps/mtdump-2014-07-10.sql
and running all those commands again.
mysql> show tables;
+------------------+
| Tables_in_mtpipe |
+------------------+
| finders |
| master_finders |
| master_images |
| sub_images |
+------------------+
4 rows in set (0.00 sec)
master_images
:mysql> select count(*) from master_images;
+----------+
| count(*) |
+----------+
| 12988 |
+----------+
1 row in set (0.01 sec)
drz_mode
is something other than 'center' or 'wide':mysql> select count(*) from master_images where drz_mode <> "center" and drz_mode <> "wide";
+----------+
| count(*) |
+----------+
| 3096 |
+----------+
1 row in set (0.00 sec)
master_images
where drz_mode
is wide
:mysql> select count(*) from master_images where drz_mode = "wide";
+----------+
| count(*) |
+----------+
| 5443 |
+----------+
1 row in set (0.00 sec)
master_images
where drz_mode
is wide
and cr_mode
is no_cr
:mysql> select count(*) from master_images where drz_mode = "wide" and cr_mode = "no_cr";
+----------+
| count(*) |
+----------+
| 3302 |
+----------+
1 row in set (0.01 sec)
drz_mode
is wide
and cr_mode
is cr
:mysql> select count(*) from master_images where drz_mode = "wide" and cr_mode = "cr";
+----------+
| count(*) |
+----------+
| 2141 |
+----------+
1 row in set (0.01 sec)
master_images
where drz_mode
is wide
:mysql> select count(*) from master_images where drz_mode = "center";
+----------+
| count(*) |
+----------+
| 4449 |
+----------+
1 row in set (0.01 sec)
master_images
where drz_mode
is wide
and cr_mode
is no_cr
:mysql> select count(*) from master_images where drz_mode = "center" and cr_mode = "no_cr";
+----------+
| count(*) |
+----------+
| 2308 |
+----------+
1 row in set (0.01 sec)
drz_mode
is wide
and cr_mode
is cr
:mysql> select count(*) from master_images where drz_mode = "center" and cr_mode = "cr";
+----------+
| count(*) |
+----------+
| 2141 |
+----------+
1 row in set (0.00 sec)
It seems to be missing 1161
wide
cr
files and 167
center
cr
files.
So I think we need to process 3096
files and reprocess 1328
files.
Yeah, things don't look great from this dump in terms of completeness.
drz_mode | cr_mode | records |
---|---|---|
center | no_cr | 2308 |
center | cr | 2141 |
wide | no_cr | 3302 |
wide | cr | 2141 |
What I'm trying to figure out here is if this dump is worth saving or if we should start over from scratch. most of the computation time is in the ephemerides retrieval though. From @kevinfhale's work on Issue #89 we know each request takes <1 sec with the new CGI request interface, so we can estimate this processing time.
Find the number of records in the master_finders
table that are connected tomaster_images
results with either a wide
or center
drz_mode
. You can do this with a JOIN
on the master_images.id
and master_finders.master_images_id
fields.
Also, give me the number of _single_sci.fits
files in the file system, this will give an idea of the completeness of the current database dump. Between these two number we should be able to get some idea of the total processing time required to rebuild this table and thus if it's the right thing to do.
master_finders
that are connected to master_images
:mysql> select count(*) from master_finders inner join master_images on master_images.id = master_finders.master_images_id where master_images.drz_mode = "wide" or master_images.drz_mode = "center";
+----------+
| count(*) |
+----------+
| 225938 |
+----------+
1 row in set (0.06 sec)
I hope the SQL query is correct.
_single_sci.fits
files in the file system: 42869
Here is the script I used to get this number:
import glob
list_of_fits = glob.glob('/astro/mtpipeline/mtpipeline_outputs/wfpc2/*_*/*_single_sci.fits')
print len(list_of_fits)
OK, so let's refine that last estimate a little. Is the number of *_single_sci.fits
images the same as the number of images that need ephemerides? What objects do need ephemerides for? Hint, this is an astronomy question not a software question.
Based on that can you refine that number above to give a more accurate list of the number of *_single_sci.fits
images we need ephemerides for?
No, it isn't. The objects that need ephemerides are: c0m_wide_single_sci.fits
, c0m_center_single_sci.fits
, cr_c0m_wide_single_sci.fits
ands cr_c0m_center_single_sci.fits
.
The number of *_single_sci.fits
files in the system that need ephemerides is 42496
:
cr_c0m_center_single_sci.fits
: 10664
cr_c0m_wide_single_sci.fits
: 10664
c0m_center_single_sci.fits
: 10584
c0m_wide_single_sci.fits
: 10584
There are 112565
images without ephemerides in the database (ephem_x
or ephem_y
is NULL
).
There are ephemerides for 113373
images in the database. We still need ephemerides for 184099
*_single_sci.fits
images.
Yes, that's true. However, there is another criteria I was thinking of: moons. The purpose of the ephemerides, as I understand it, is mark incidental observations. Meaning, if Jupiter is the target, I know that Jupiter will be in the image. But, will any of the moons? The only way to calculate that is the find the epemerides for all the moons of each planet.
So where I'm going with this is that we need a way to determine which targets have moons. Obviously this is just the planets, but there is no standardized naming convention enforced. Meaning that if you were observing Jupiter with WFPC2 in the 90s you were free to identify your target as jupiter
, JUPITER
, Jupiter
, Jup
, Jup-Red-Spot
, Jupiter-S-Hemi
, etc. So we'll have to be smart about how we pick those observations out from the db.
Any thought on a query of either the file system or the database that would return the list of observations that have moons?
So, I need to think of a query that would return the number of observations that have moons. But do I need to search among the planets or just the moons? Do the moons need to be in a planet image?
So I think you would need to start with a list of all the target names. You could get that from the DB or from the folder names. Then you need to figure out which ones are planets or moons. I would include moons because given the field of view (FOV) of the cameras it's likely that an exposure of any one moon would include others.
Basically, you want to exclude the observations of comets, asteroids, that one planet without any moons. Those guys.
The moons are listed in mtpipeline/ephem/planets_and_moons.txt
, right?
What we could do is get those moons, do a query in the db and compare to see how many of them the db has stored. Do you think this would work?
The names of the moons are listed in mtpipeline/ephem/planets_and_moons.txt
but the problem is that those names aren't the names of the target
so they won't be listed. Instead I think we need to look at the list of unique target names and decide which ones have moons by the target name and skip the rest.
Then we'll have to figure out a way to the build_master_finders_table.py
to only use those records.
Where are the target
names located?
This is the script I implemented using the idea I had:
#! /usr/bin/env python
from mtpipeline.get_settings import SETTINGS
from mtpipeline.database.database_interface import Base, engine, session
from mtpipeline.database.database_interface import MasterImages
def get_moons(moons):
moons_list = []
with open(moons, 'r') as moons_file:
for obj in moons_file:
pair = obj.split(' ')
if len(pair) > 3:
if pair[0][-2:] != '99':
moons_list.append(pair[1].upper())
return moons_list
def db_moons():
count = 0
found = []
moons_list = get_moons('mtpipeline/ephem/planets_and_moons.txt')
for moon in moons_list:
count += len(session.query(MasterImages.object_name).filter(MasterImages.object_name.like('%' + moon + '%')).all())
return count
if __name__ == '__main__':
print db_moons()
It returns 6460
objects.
I checked the filesystem and I got the same result. There are 6460
moon observations in the filesystem.
Run a query to see how many unique object_name
values there are in master_images
. object_name
is the same as targname
in the FITS header.
From that list we need to figure out which targets have moons and find a way to save that information so that when we run build_master_finders_table.py
it populates those records with none
or something for the appropriate fields.
There are 721
distinct objects in the filesystem and 113
of them are moon observations.
I have a problem with my MySQL again. I will show the numbers from the db when I solve it.
Jim rebuilt the MySQL package. I'll have to reprocess everything again. However, I still can't open MySQL in my machine. I sent another email to support but no one answered me yet.
That's frustrating. Not a big deal because it was only a few hours of processing but if this happens again take a database dump before letting them work on it.
I wouldn't be able to do it because I couldn't use MySQL, only if I had saved it when I migrated the data to master_finders
.
Right, it's fine. Everyone kills a DB once and learns their lesson about backups :smile:
So, there are 721
distinct objects in the master_images
table and 113
of them are moon observations.
What should I do about build_master_finders_table.py
?
We could store the list of moons in a file and use it while running the script.
So how did you determine which of the objects in master_images
were "moon observations"?
Using mtpipeline/ephem/planets_and_moons.txt
(I did not include the planets present on this list). If the object_name
in master_images
had the name of a moon in the txt file, then I supposed it would be a moon observation.
Can you share, say, the 1st 50 object_names
that your method picks as not having moons?
Here are the 1st 50:
DIONE
HERMIONE
IAPETUS-CASSINI-REGIO
IO
IO-018
IO-090
IO-162
IO-234
IO-306
IO-CML154-LARGE
IO-CML199-MEDIUM
IO-CML23-LARGE
IO-CML23-MEDIUM
IO-CML272-LARGE
IO-CML331-LARGE
IO-CML333-MEDIUM
IO-CML99-MEDIUM
IO-ECL-EGRESS
IO-ECLIPSE
IO-ECLIPSE-NOCML
IO-ECLIPSED
IO-LON-045
IO-LON-135
IO-LON-225
IO-LON-315
IO-LON100
IO-LON160
IO-LON215
IO-LON250
IO-LON320
IO-LON340
IO-LON340-2
IO-LON70
IO-NUV-E
IO-NUV-W
IO-TORUS-135
IO-TORUS-225
IO-TORUS-225-1
IO-TORUS-315
IO-TORUS-45
IO-TORUS-ECLIPSE
IO-TORUS-WEST
IO-TRANSIT
IO-V1-LON165
IO-V2-I24B
IO-V3-ACALA
IO-W-FUV
IO-WEST
JUP-IOFOOT-IMAGE
JUPITER-AURORA-IO
Are those the rejected ones? Because 100% of those have moons.
Maybe this is the point that I haven't been clear about, we're not looking for observations where the object_name
is a moon. We're looking for observations that may contain moons. The trivial case is when the moon is the target.
The other part of the problem is finding all the observations that could contain moons. For example, any observations of Mars or either of it's moons, Phobos and Deimos, should be checked for incidental moon observations.
Both of these are inherently string parsing issues. Take a closer look at the list you just posted and you should see the issue.
Sorry, I got the wrong ones. Here are the ones you asked:
00CE105
00CF105
00CG105
00CH105
00CK105
00CL104
00CO105
00CP104
00CQ105
00CQ114
00EE173
00FD8
00FS53
00GE147
00GP183
00GV146
00JF81
00JG81
00KK4
00KL4
00OH67
00OK67
00OL67
00ON67
00OO67
00PD30
00PX29
00QE226
00WT169
00WV12
00WW12
00YA2
00YB2
00YH2
00YX1
01HY65
01HZ58
01KF76
01KG76
01KL76
01KN76
01OQ108
01PK47
01QB298
01QD298
01QE298
01QJ298
01QQ322
01QR297
01QR322
Hah, OK, that's better :smile:
Weird, what are these?
I think most of them are KBOs. 00CE105
is in /astro/mtpipeline/mtpipeline_outputs/wfpc2/09060_kbos/
.
Yeah, that's probably what it is. But I'm surprised that only 113 of the 721 images are either planets or moons. I can't help but suspect that there's some string parsing issue going on. Take a look at the list and see if anything pops out at you.
No, 113
of them are observations that have moon. I didn't include the planets present in the planets_and_moons.txt
file. And it doesn't have the list of all planets.
Include the planets. Finding all the variations in the planet names is actually the problem I want you to solve.
From the 42496
records in master_images
, 28908
of them are observations for planets and moons.
The number of distinct observations for planets and moons is 296
(from a total of 721
).
Ok so about 75% of the total records and 40% of the total targets. That seems more reasonable to me. And are the excluded distinct observations all looking like KBOs, asteroids, comets, and such?
Yes, they do (like ELEKTRA
, VESTA
, HALE-BOPP
). However, there might be some that are dwarf planets observations like CERES
and clusters like NGC1850-CENTER
.
We have a database of most of the ephemerides for the FITS images that were processed prior to the summer. We need to do some digging into that database dump to see how much more data need to be processed and/or reprocessed.
To start off load this MySQL database dump into your localhost:
/astro/mtpipeline/database_dumps/mtdump-2014-01-19.sql
Once you've done that show the command and the output to list the tables in the database.