Closed nickumia-reisys closed 1 year ago
The issue is due to the size of status message from the last job report. It is hard to replicate because other harvest sources, or other jobs from the same NASA source, do not have such big status message.
The screenshot show one sample object_error_summary
message. There are 20 of them in the status field, so the total char size is more than 40k, exceeding the solr strField limit 32766
.
The proposed fix is to truncate the large error message. From
message: "Identifier: C1214353986-ASF; Title: UAVSAR_POLSAR_METADATA; 1 Error(s) Found. ### ERROR #1: 'theme':['[\n "Hayward Fault', 'CA"', '"Laurentides Reserve', 'QC', 'Canada"', '"Capitol Forest', 'WA"', '"Yellowstone National Park', 'WY"', '"Sierra', 'CA"', '"Panhandle', 'FL"', '"Isla de Coiba', 'Panama"', '"Chilean Volcanoes', 'Chile"', '"Oah', 'HI southeast"', '"Sabine Refuge', 'LA"', '"Barrier Islands', 'MS"', '"Northwest Coast', 'FL"', '"Tolima Volcano', 'Colombia"', '"Napo River', 'Peru/Ecuador"', '"SMAP Drought', 'TX"', '"SMAP MOISST Flux Tower Site', 'OK"', '"Buenos Aires Province', 'Argentina"', '"Laguna Del Maule Volcano', 'Chile/Argentin"', '"Reventador Volcano', 'Ecuador"', '"Antuco Volcano', 'Chile"', '"Chillan Volcano', 'Chile"', '"Imbabura Volcano', 'Ecuador"', '"Descabezado Grande Volcano', 'Chile"', '"Cascade Volcanoes', 'WA"', '"Tonzi Ranch', 'CA"', '"Panama Canal forests', 'Panama"', '"Cerro Negro Volcano', 'Colombia/Ecuador"', '"Rosario', 'Argentina"', '"San Antonio de Areco', 'Argentina"', '"Grand Mesa', 'CO"', '"Longview', 'TX"', '"Libreville', 'Gabon"', '"Ogooue River', 'Gabon"', '"Trout Lake', 'Canada"', '"Delta Junction', 'Alaska"', '"Yukon Flats', 'Alaska"', '"Old Crow', 'Canada"', '"Trinity River', 'TX"', '"Sabine River', 'TX"', '"Lloydminster East', 'Saskatoon"', '"South Fort Smith', 'Canada"', '"Innoko Flats"', '"Coldfoot Legacy Line"', '"TomoSAR offset line 64 meters"', '"Teller NGEE"', '"Fuego Volcano', 'Guatemala"', '"Berms TomoSAR 240m baseline"', '"Croatan National Forest', 'NC"', '"Delta Junction NEON site"', '"Ridgecrest', 'CA"', '"Atchafalaya River Delta', 'LA"', '"New Orleans Levee', 'LA"', '"Dominican Republic"', '"La Amistad International Park', 'Panama"', '"Howland Forest', 'ME"', '"Grand County', 'CO"', '"San Joaquin Valley', 'CA"', '"Corcovado National Park', 'Costa Rica"', '"East Central Coast', 'LA"', '"Lanai/Maui/Molokai/Oah', 'HI"', '"Yosemite National Park', 'CA"', '"Florida Keys', 'FL"', '"Barataria Bay', 'LA"', '"Huila Volcano', 'Colombia"', '"Sangay Volcano', 'Ecuador"', '"Hokkaido Volcanoes', 'Japan"', '"PiSAR-L2 Nara Totsukawa-mura', 'Japan"', '"Cordoba Province', 'Argentina"', '"PiSAR-L2 Kumamoto - Aso', 'Japan"', '"Yacamane Volcano', 'Peru"', '"Tutupaca Volcano', 'Peru"', '"Pacific Mangrove'] is not valid under any of the given schemas."
to
message: "Identifier: C1214353986-ASF; Title: UAVSAR_POLSAR_METADATA; 1 Error(s) Found. ### ERROR #1: 'theme':['[\n "Hayward Fault', 'CA"', '...] is not valid under any of the given schemas."
Above PR truncated individual error messages. With this fix, the whole object_error_summary
message size should be greatly reduced.
We need to manaully run the harvest source so that the last job error message is in good shape and source succeed to be re-indexed to solr.
Issue fix verified. NASA Data.json saved as a monthly job. Rebuilding index is fine.
INFO [ckan.lib.search] Indexing just package 'nasa-data-json'...
INFO [ckan.lib.search] Finished rebuilding search index.
How to reproduce
Expected behavior
Action Successful
Actual behavior
Other notes
Sketch
Ask @FuhuXia