Open kanchukaitis opened 2 years ago
The error message should make the user feel shame about submitting a file with duplicated series
With @ifeoluwaale back, we can address this. @kanchukaitis do you have an error message to see? Or is the file you have uploaded here an example of an error causing input file?
Edit: are there any other specific sample files from ITRDB that we can look at? So @ifeoluwaale can break readers.py some more (and fix all the problems)
Hi @CosiMichele @ifeoluwaale - yeah, the viet001.rwl is giving an error. Ideally, we should now shoot for being able to acquire ANY ITRDB rwl file and read it in (not just the 3 test files we have) - even if there is ultimately a failure, we need to have verbose error output of where the failure is occurring - but yes, let's start with the attached viet001.rwl and go from there
@CosiMichele @ifeoluwaale - here are some others to try with various challenges:
https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/asia/th001.rwl https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/northamerica/canada/cana157.rwl https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/northamerica/canada/cana323.rwl
All three have some classic challenges typical of some of the LDEO rwl files (particular in series names)
Another one to code for is the rare (but real) case where years go back before 1 CE. The negative year subscript causes issues with series names. E.g,
https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/northamerica/usa/ca667.rwl
From: Kevin Anchukaitis @.> Date: Tuesday, August 23, 2022 at 10:22 AM To: OpenDendro/dplPy @.> Cc: Andy Bunn @.>, Comment @.> Subject: Re: [OpenDendro/dplPy] readers.py chokes on rwl when there are repeated series IDs (Issue #33)
@CosiMichelehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCosiMichele&data=05%7C01%7Cbunna%40wwu.edu%7C3c33f0811a7d4d063c2008da852c0933%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C637968721441599997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=P42PfOgCqK8ObCtmqhz%2FxuPThFnTfSGrkqUxNWzTRGM%3D&reserved=0 @ifeoluwaalehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fifeoluwaale&data=05%7C01%7Cbunna%40wwu.edu%7C3c33f0811a7d4d063c2008da852c0933%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C637968721441599997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QaldtOZeq9afrLzjworY4cerkvp%2FLiUQoCGj7O89Bfs%3D&reserved=0 - here are some others to try with various challenges:
https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/asia/th001.rwlhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ncei.noaa.gov%2Fpub%2Fdata%2Fpaleo%2Ftreering%2Fmeasurements%2Fasia%2Fth001.rwl&data=05%7C01%7Cbunna%40wwu.edu%7C3c33f0811a7d4d063c2008da852c0933%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C637968721441599997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=IOCHkzmp7HpuY8S3Tw8VhIp1y9kCR58lm68qirlJSQE%3D&reserved=0 https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/northamerica/canada/cana157.rwlhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ncei.noaa.gov%2Fpub%2Fdata%2Fpaleo%2Ftreering%2Fmeasurements%2Fnorthamerica%2Fcanada%2Fcana157.rwl&data=05%7C01%7Cbunna%40wwu.edu%7C3c33f0811a7d4d063c2008da852c0933%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C637968721441599997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vmVy2dRVt5W8oNuDKv%2BcGfLlV3I%2BuFMYrsBrxgpX%2F7U%3D&reserved=0 https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/northamerica/canada/cana323.rwlhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ncei.noaa.gov%2Fpub%2Fdata%2Fpaleo%2Ftreering%2Fmeasurements%2Fnorthamerica%2Fcanada%2Fcana323.rwl&data=05%7C01%7Cbunna%40wwu.edu%7C3c33f0811a7d4d063c2008da852c0933%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C637968721441599997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=awfIr3gTVTE9sOYUR0BoUUDcnVOIUWhriJfiNADwuus%3D&reserved=0
All three have some classic challenges typical of some of the LDEO rwl files (particular in series names)
— Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FOpenDendro%2FdplPy%2Fissues%2F33%23issuecomment-1224391014&data=05%7C01%7Cbunna%40wwu.edu%7C3c33f0811a7d4d063c2008da852c0933%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C637968721441599997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RuAoY6w0TKiz9E3ra7UzCV%2FyR2lOF5NAF9lJLDriyqg%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAC7UCXLJ5VNDGMO2QQGQQ63V2UCEZANCNFSM52AH65HQ&data=05%7C01%7Cbunna%40wwu.edu%7C3c33f0811a7d4d063c2008da852c0933%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C637968721441599997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9l3zk2NITx0kt%2FyhkQl1ICO5Nz5%2FHAkDnL8povWWBo4%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>
@ifeoluwaale is looking into whether/how we solved this before we close it. Procedure would be (1) identify repeated sample identifications, (2) warn/yell at user, and then optionally (3) rename one or more series in a predictable way but not using common core IDs (A, B, C ... etc. risk making the problem worse) - opinions @AndyBunn about the best way to deal with this?
This is a common problem with dplR too - readers.py needs a way to deal with repeated sample IDs (either a verbose warning or a modification of the sample ID (e.g. adding an underscore)). In general we need to test readers.py with a variety of .rwl files (not just the idealized test ones) and we need informative error messages viet001.rwl.txt