AAVSO / VStar

VStar is a visualisation and analysis tool for variable star data brought to you by AAVSO
https://www.aavso.org/vstar
GNU Affero General Public License v3.0
9 stars 3 forks source link

Simple text format: observation flags are not consistent between reading and writing. #140

Closed mpyat2 closed 3 years ago

mpyat2 commented 3 years ago

SimpleFormatObservationSinkPlugin writes "G" observation flag for "good" observations. In turn, SimpleTextFormatValidator() acceps only "D" flag ("discrepant"). So, simple text format, generated by VStar ["Observations": File->Save (Type=Simple)] cannot be read as "Download or Simple" source because all "good" lines contain unrecognizable "G" flag!

dbenn commented 3 years ago

Wow. This is odd @mpyat2. Given that we are writing observations that may have been loaded as AAVSO Download format in simple format, either the latter should expand to allow validation flags other than D (in the early VStar spec) or not output a validation flag at all.

Thoughts?

@SaraAAVSO, do you have a view on what is best to do here, given your recollections of the early VStar development period vs how things have moved on. I recall an evolution of validation flags in AAVSO also.

dbenn commented 3 years ago

@SaraAAVSO: some additional background. Until quite recently (the last release in 2020), saving obs was a special feature, whereas now there's a plugin category called observation sink (vs source). Internally, in the last release I converted the existing save obs code to use the new plugin interface, and the issue Max raises crept in during that conversion.

SaraAAVSO commented 3 years ago

Hi @dbenn and @mpyat2,

I don't remember how we wound up with validation flags of "B" and "D" which as far as I know were never official validation flags. As explained in the Data Download tool:

Validation Flag: This flag describes the level of validation of the observation.

V means the observation has passed our validation tests.
T means that during the validation phase it was flagged discrepant and should be used with extreme caution.
Z means it has only undergone pre-validation, meaning it was checked for typos and data input errors only.
U means it has not been validated at all and should be used with caution.

So "T" = bad while "V" and "Z" should be considered good. I'm not sure what we decided to do with "U" but I think it will be displayed along with other good data in VStar.

If someone requests data using the Data Download Tool and asks not to get discrepant data, they will get only data with valflags "V" and "Z". If they request that discrepant data is included they will also get observations with the "T" and "U" flags.

I agree that the same format should apply to reading and writing.

Does this answer your questions?

Thanks, Sara

mpyat2 commented 3 years ago

So both "G" (good) and "D" (discrepant) are purely VStar specific values. Both these flags recognized if the input format is "Download", "G" rejected if the input format is "simple" however "D" is recognizable for the "simple" format.

IMO, the simplest fix for the "simple" format is to write "D" for discrepant records and empty flag for good records.

SaraAAVSO commented 3 years ago

That seems like a reasonable solution to me Max.

Thanks, Sara

dbenn commented 3 years ago

Thanks @SaraAAVSO and @mpyat2. I went looking back at old emails. :) Have not yet found the origin (in my mind or elsewhere) of the D as discrepant, except perhaps in some old SQL. Certainly references to T as discrepant.

Page 9 of the VStar user manual nicely avoids the issue too. :(

Max's approach seems fine to me too, but I wonder about an alternative. The simple obs source code could simply use the AAVSO download format obs source validator (the class that checks whether a valflag is OK).

Any simple format files out there that still use D (if any ever did) ought to be few and far between. That way, if someone loads AAVSO download data, then saves that as simple format (which is possible now!), whatever valflag is there will be saved. If the simple format loader loads a file with an AAVSO-legal valflag, all will be well with the world. For added goodness, we could add D to the permitted valflags in the simple loader (since it's there now).

Or, maybe I'm just overthinking this...

SaraAAVSO commented 3 years ago

I really don't have a strong opinion about this either way. Whatever is less confusing for the user is probably best.

Many thanks, Sara

dbenn commented 3 years ago

Understand @SaraAAVSO. I'm tempted to go with the solution proposed by @mpyat2. I'm just puzzled by:

Max, if we go with your approach, if we have either a valflag of 'D' or 'T' in an observation, you're thinking we should just write 'D' for backward compatibility and for any other valflag in an observation, we write the empty string. Correct?

dbenn commented 3 years ago

Further to my last comment, the simple format sink code calls this function in ValidObservation:

    /**
     * Returns a line in TSV format of the following fields (bracketed fields
     * are optional):
     * 
     * [Phase,][JD,]MAGNITUDE,[UNCERTAINTY],[OBSERVER_CODE],[VALFLAG]
     * 
     * @param delimiter
     *            The field delimiter to use.
     * @param includeJD
     *            Should the JD be included in the output?
     */
    public String toSimpleFormatString(String delimiter, boolean includeJD) {
        StringBuffer buf = new StringBuffer();

        if (Mediator.getInstance().getAnalysisType() == AnalysisType.PHASE_PLOT) {
            buf.append(this.getStandardPhase());
            buf.append(delimiter);
        }

        if (includeJD && this.getDateInfo() != null) {
            buf.append(this.getDateInfo().getJulianDay());
            buf.append(delimiter);
        }

        buf.append(this.getMagnitude().isFainterThan() ? "<" : "");
        buf.append(this.getMagnitude().getMagValue());
        buf.append(delimiter);

        double uncertainty = this.getMagnitude().getUncertainty();
        // TODO: why != here and > in next method?
        if (uncertainty != 0.0) {
            buf.append(uncertainty);
        }
        buf.append(delimiter);

        if (details.keySet().contains(obsCodeKey)) {
            buf.append(details.get(obsCodeKey));
        }
        buf.append(delimiter);

        if (this.validationType != null) {
            buf.append(this.validationType.getValflag());
        }
        buf.append("\n");

        return buf.toString();
    }

the key bit being:

        if (this.validationType != null) {
            buf.append(this.validationType.getValflag());
        }

I'm still inclined to leave the sink code as it is and allow the AAVSO valflags in the simple format source.

But the problem with this is lack of backward compatibility unless the simple format validation includes 'D'.

Hmm. Definitely overthinking this...

Most uses of the simple format are I suspect usually time, mag, and in some cases maybe including uncertainty +/- observer code.

dbenn commented 3 years ago

So, I think that for now, let's go with Max's proposal. I'll create a branch for this.

dbenn commented 3 years ago

Also, I'm not sure where the 'G' instances come from since toSimpleFormatString() just uses whatever valflag was in the observation. The implication is that some observations loaded by VStar contain 'G'. Do you have an example of a loaded file that contains 'G' @mpyat2 ? From AID, AAVSO Download, or some other source?

mpyat2 commented 3 years ago

@dbenn , I do not have an example of a loaded file with 'G' (see, however, a comment in ValidationType.java in a message below).

However, this flag arose several time in VStar code/comments:

AAVSODownloadFormatValidator.java (code):

    public AAVSODownloadFormatValidator(CsvReader lineReader, int minFields,
            int maxFields, IFieldInfoSource fieldInfoSource) throws IOException {
        super("AAVSO format observation line", lineReader, minFields,
                maxFields, "G|D|T|P|V|Z", fieldInfoSource);

CommonTextFormatValidator.java (comment only):

     * @param valflagPatternStr
     *            A regex pattern representing the alternations of permitted
     *            valflags for this validator instance, e.g. "D" (simple format)
     *            or "G|D|T|P|V|Z" (AAVSO download format).
     * @param fieldInfoSource
     *            A mapping from field name to field index that makes sense for
     *            the source.
     */
    public CommonTextFormatValidator(String desc, CsvReader lineReader,
            int minFields, int maxFields, String valflagPatternStr,
            IFieldInfoSource fieldInfoSource) throws IOException {

ValflagValidator.java (comment only)::

public class ValflagValidator extends AbstractStringValidator<ValidationType> {

    private static final String KIND = LocaleProps
            .get("VALIDATION_FLAG_VALIDATOR_KIND");

    private final RegexValidator regexValidator;

    /**
     * Constructor.
     * 
     * @param valflagPatternStr
     *            A regex pattern representing the alternations of permission
     *            valflags for this validator instance, e.g. "D" (simple format)
     *            or "G|D|T|P|V|Z" (AAVSO download format). This pattern string
     *            will be wrapped in a ^(...)$ to ensure that nothing else
     *            exists in the string, and that there is one capturing group.
     */
    public ValflagValidator(String valflagPatternStr) {
        super(KIND);

Also 'G' appears in ValidationType.java, see below.

mpyat2 commented 3 years ago

The earliest version of AAVSODownloadFormatValidator.java in the repo is dated 2009-07-13. It seems it was created at that time. G and D flags are already there:

    public AAVSODownloadFormatValidator(String delimiter, int minFields,
            int maxFields, ITableFieldInfoSource fieldInfoSource) {
        super("AAVSO format observation line", delimiter, minFields, maxFields,
                "G|D|P", fieldInfoSource);

It looks like the 'P' flag is also non-standard (at least it's undocumented).

mpyat2 commented 3 years ago

I've found a note about 'P' (version on 2010-02-13): "Fixed interpretation of valflag of 'P' for AAVSO download format to mean "pre-validated"; minor comment updates. See also tracker 2950557." Also a comment from "CommonTextFormatValidator.java": // Here we convert 'P' to "pre-validated" since 'P' for database // source observations means "Published" (which is treated as "Good"). // If we ever want to use this class for database observation field // validation, we should probably just change the SQL to change 'P' // to 'G' (published to good), allowing ValidationType to treat 'Z' // and 'P' as pre-validated. Then we can get rid of the first case // below.

Flags T, V, Z where added on 2010-09-01, revision comment: "Fixed handling of V and T valflags in AAVSO download format; fixed NaNs in confidence interval calc; tweaked MessageBox exception error method message string."

And there was a flag 'Y'! (version on 2010-02-14: "Excluding obs with a valflag of "Y". See tracker 2858633. Rationalised validation type handling. Displaying human readable validation type instead of valflag in obs list table.").

The 'Y' flag currently is not accepted.

The 'U' flag is not accepted too (tested on Download Format file).

mpyat2 commented 3 years ago

See also ValidationType.java:

public enum ValidationType {

    // See https://sourceforge.net/apps/mediawiki/vstar/index.php?title=Valflag
    //
    // In older AAVSO download format files we see 'G' for "Good" instead of 'V'.
    // We could deprecate its use, but permitting it provides backward compatibility.
    //
    // According to http://www.aavso.org/data/download/downloadformat.shtml,
    // 'P' means "Pre-validated"; so there is a conflict between download
    // format and database originated validation flags. We assume this has
    // been mapped from 'P' to 'Z' in getTypeFromFlag() below.

    GOOD,
    DISCREPANT,
    PREVALIDATION,
    BAD;    

    /**
     * Given a valflag from an input file or database, return
     * the corresponding validation type.
     */
    public static ValidationType getTypeFromFlag(String valflag) {
        ValidationType valtype = BAD;

        if ("G".equals(valflag)) {
            valtype = GOOD;
        } else if ("D".equals(valflag) || "T".equals(valflag)) {
            valtype = DISCREPANT;
        } else if ("P".equals(valflag)) {
            valtype = GOOD;
        } else if ("V".equals(valflag)) {
            valtype = GOOD;
        } else if ("Z".equals(valflag)) {
            valtype = PREVALIDATION;
        }

        assert(valtype != null);

        return valtype;
    }
<...>

ValidationType.getValflag() may return the "B" flag, unacceptable at the input too.

dbenn commented 3 years ago

Thanks for the detail @mpyat2! I had not looked at the revision history yet, so thanks. Of course AAVSODownloadFormatValidator is related to AAVSO Download Format. A pity https://sourceforge.net/apps/mediawiki/vstar/index.php?title=Valflag no longer exists.

dbenn commented 3 years ago

Okay, after taking all of this on board, I decided to create a common valflags pattern (as previously only used in AAVSO Download format validator) passing this to both simple and AAVSO Download format validators since it contains "D" (used by simple validator). This provides backward compatibilty and unifies both formats with respect to valflags.

SaraAAVSO commented 3 years ago

Thanks for sorting this out you two!

I just found an old Wiki doc that says we used to use "D" for a "discrepant fainter-than observation". We decided to do away with that because there was no point in keeping an extra valflag when we can use a combination of "T" and the fainter-than boolean. I think that we also stopped using "P" (published) and "L" (good fainter-than) around the same time.

Sorry that my mind is so foggy on this. It was a long time ago.

-Sara

dbenn commented 3 years ago

Thanks @SaraAAVSO Yes, we've been doing this for awhile now and you way longer. :)

I think with the current solution we have all bases covered, including legacy cases.

All except for U it would seem. Wondering whether we should add that too?

SaraAAVSO commented 3 years ago

"U" is a relatively new flag. It means "Unvalidated (never looked at)" and is currently being applied to data sent to us via the BAA. Some of our tools are displaying it by default and other are not. For example, the Data Download tool only gives it when one selects "Show Discrepant".

I think we probably should show "U" data in VStar by default, but if the observation details could interpret this as "Unvalidated" that would be ideal.

Many thanks, Sara

dbenn commented 3 years ago

No worries @SaraAAVSO. Done. I'll add you as a reviewer to the pull request if you have any comments.