This is the version used to create the current galleries.
Everything is still in Python 2, although a conversion to 3 would probably be straightforward.
The highest-numbered press release checked is PIA25121.
The code could stand a good bit of cleanup but this currently meets our needs.
The superclass of all gallery pages is GalleryPage, defined in gallerypage.py
.
is_planetary
, targets
, target_types
,
missions
, hosts
, keywords
, caption_html
, etc.press_releases/pages
subdirectory.Subclass PiaPage handles Photojournal gallery pages at https://photojournal.jpl.nasa.gov/
.
piapage/__init__.py
. Use "import piapage
".piapage
directory support this class.Subclass HubblePage handles planetary web pages at https://hubblesite.org/
.
The idea is that each subclass of GalleryPage handles the nuts and bolts of getting press release product info from a particular source, and then reorganizes that info a standardized format.
https://apod.nasa.gov/apod/astropix.html
).http://pluto.jhuapl.edu/
.The GalleryPage superclass includes methods to support the scraping of information out of online captions and organizing that info as needed.
dicts
subdirectory contains carefully curated dictionaries that define some of
the info that is extracted from the captions and extracted values are normalized.galleries.py
defines two useful functions for creating these galleries.
by_release_date
takes a dictionary of GalleryPage objects, a bunch of additional information,
and creates index pages that organize the selected pages by date.by_target
takes a dictionary of GalleryPage objects, a bunch of additional information,
and creates index pages that organize the selected pages by target name or target type.PIA12345
")
or as an integer (e.g., 12345
) and creates a PiaPage for that product.PDS-Galleries/PIAxxxxx
.
PIA12345
is in PDS-Galleries/PIAxxxxx/PIA12xxx/PIA12345.html
.PIAPATH
defines the root location of the cache, meaning
the directory that contains a subdirectory called PIAxxxx
.piapage/repairs.py
contains a bunch of specialized instructions for how to fix
the metadata extracted from individual product pages. Things are automated to the extent possible,
but there is so much variation among Photojournal pages that, in the end, I had to build a big
dictionary listing keywords to add, keywords to remove, values to replace, etc.
piapage/BACKGROUND_STRING.py
contains strings that are used to distinguish
background text from caption text. It needs to be actively maintained.The Cassini-Huygens mission is a cooperative project of NASA, the European Space Agency and the Italian Space Agency. The Jet Propulsion Laboratory, a division of the California Institute of Technology in Pasadena, manages the mission....
is_color
and is_grayscale
. This information is hard to obtain from the Photojournal,
but a separate procedure works reasonably well.
piapage/make_COLOR_VS_PIAPAGE.py
is a stand-alone program that scrapes the
Photojournal site for information about which images are color and which are grayscale.piapage/COLOR_VS_PIAPAGE.py
, containing its results.is_color
and is_grayscale
to each object.make_COLOR_VS_PIAPAGE.py
to keep this index up to date./Library/WebServer/Documents/press_releases
directory tree, although you can modify this behavior by editing the value of class
constant DOCUMENTS_FILE_ROOT_
inside gallerypage.py
.piapage/GIF_PIAPAGES.py
.
check_for_missing_or_unneeded_previews.py
manually before
a deploy to identify any missing image files.random-scripts/getgif.sh
might be useful for downloading missing images.piapage/training
subdirectory. We're not using it now.Make sure your PIAPATH
environment variable points to a local copy of the shared Dropbox
directory PDS-Galleries/PIAxxxxx
. I strongly recommend that you work on a local copy, and
then rsync it back to Dropbox when you are done. This avoids any chance of clobbering the copy
on Dropbox.
Be sure that the file PDS-Galleries/PIAxxxxx/PIAPAGE_CATALOG.pickle
has been copied from
Dropbox. If it is not present, this will necessarily be a full update rather than an incremental deploy.
If you don't want to write any newly retrieved JPEG files into /Library/Webserver/Documents
,
edit DOCUMENTS_FILE_ROOT_
inside gallerypage.py
. For example, if you are deploying to
the "8080" version of the website first, DOCUMENTS_FILE_ROOT_
should point to
/Library/Webserver/Documents_8080
instead. (This could be set up using an environment
variable.)
Right now, we are only tracking Photojournal pages up to 25999. If we have started to
see pages above 25900 or so, edit piapage/MAX_PIAPAGE.py
to specify a higher limit.
cd
to the rms-galleries
repo directory.
In an ipython2 session...
import piapage
For an incremental update:
catalog = piapage.build_catalog()
piapage.save_catalog(catalog)
Alternatively, for a full update:
catalog = piapage.build_catalog(incremental=False)
piapage.save_catalog(catalog)
At this point,
PDS-Galleries/PIAxxxxx/PIAPAGE_CATALOG.pickle
has been updated;PDS-Galleries/PIAxxxxx
;/Library/WebServer/Documents/press_releases
;jekyll/press_releases/pages
subdirectory of this repo.To generate the new galleries, run this program at the command line (not inside ipython):
python2 piapage/piapage_galleries.py
At this point, the Jekyll galleries have been written to the jekyll/galleries
subdirectory of this repo.
Before you can process the Jekyll files and push the HTML to the local website, you need to create
links inside your local copy of the SETI/rms-website
repo:
GALLERIES=<path to rms-galleries repo>
cd <path to rms-website repo>
mkdir website_galleries
ln -s website/_config.production.yml website_galleries/_config.production.yml
ln -s website/_config.yml website_galleries/_config.yml
ln -s website/_data website_galleries/_data
ln -s website/_includes website_galleries/_includes
ln -s website/_layouts website_galleries/_layouts
ln -s website/_posts website_galleries/_posts
ln -s website/_sass website_galleries/_sass
ln -s $GALLERIES/jekyll/galleries website_galleries/galleries
ln -s $GALLERIES/jekyll/press_releases website_galleries/press_releases
To deploy to your local copy of the website:
cd <path to rms-website repo>/deploy
fab deploy localhost_galleries
Alternatively, to deploy to the local "8080" version of the website:
cd <path to rms-website repo>/deploy
fab deploy localhost_8080_galleries
Review the local website. Visit:
https://localhost/galleries.html
https://localhost/galleries/mercury.html
https://localhost/galleries/venus.html
https://localhost/galleries/moon.html
https://localhost/galleries/mars.html
https://localhost/galleries/jupiter.html
https://localhost/galleries/saturn.html
https://localhost/galleries/uranus.html
https://localhost/galleries/neptune.html
https://localhost/galleries/pluto.html
https://localhost/galleries/asteroids.html
https://localhost/galleries/comets.html
https://localhost/galleries/kbos.html
https://localhost/galleries/exoplanets.html
https://localhost/galleries/target_mars.html
https://localhost/galleries/target_jupiter.html
https://localhost/galleries/target_saturn.html
https://localhost/galleries/target_uranus.html
https://localhost/galleries/target_neptune.html
https://localhost/galleries/target_pluto.html
https://localhost/galleries/asteroid_1_ceres.html
https://localhost/galleries/comet_1p_halley.html
https://localhost/galleries/kbo_pluto.html
https://localhost/galleries/exoplanet_55_cancri.html
https://localhost/galleries/cassini.html
https://localhost/galleries/voyager.html
https://localhost/galleries/juno.html
https://localhost/galleries/new_horizons.html
https://localhost/galleries/messenger.html
https://localhost/galleries/dawn.html
https://localhost/galleries/rosetta.html
https://localhost/galleries/cassini_jupiter.html
https://localhost/galleries/cassini_saturn.html
https://localhost/galleries/voyager_jupiter.html
https://localhost/galleries/voyager_saturn.html
https://localhost/galleries/voyager_uranus.html
https://localhost/galleries/voyager_neptune.html
If history is any guide, about 10% of the new images will have metadata that needs
to be repaired. The simplest way to handle a small number of repairs is to itemize
the changes inside the REMOVALS
dictionary inside repairs.py
. Just create a new
dictionary entry keyed by the PIA code of the image. You can list extraneous keyword
values to remove, or specify new values to insert for the target, target type,
mission, etc.
Once you are satisfied with the results, deploy to the public servers. Note that this command only works from a server that has a current, up to date copy of the galleries and press releases.
cd <path to rms-website repo>/deploy
fab deploy server1_galleries
fab deploy server2_galleries
Copy the new JPEGs to the servers by rsyncing the contents of these directories (although we should really fix this):
/Library/WebServer/Documents/press_releases/thumbnails
/Library/WebServer/Documents/press_releases/small
/Library/WebServer/Documents/press_releases/medium
Rsync your local copy of the PDS-Galleries/PIAxxxxx
directory back to Dropbox.
Commit your changes back to the repo on GitHub.