arkivverket / arkade5

Arkade 5 - testverktøy for arkivuttrekk
http://arkade.arkivverket.no
GNU Affero General Public License v3.0
12 stars 17 forks source link

Validate xsd for virksomhetsspesifikkeMetadata #88

Closed pernyga closed 2 years ago

pernyga commented 6 years ago

We have additional metadata in our arkivstruktur.xml, as specified in the schemaLocation attribute, eg.

xsi:schemaLocation="http://www.arkivverket.no/standarder/noark5/arkivstruktur arkivstruktur.xsd http://www.cbrain.dk/F2/Noark F2_Noark.xsd"

The current build of Arkade5 does not support this.

I have experimented with a minor change to fix this, but as I'm pretty new to Git I was hoping for someone to fix the issue. Below is a suggested code change in ValidateXmlWithSchema.cs:

Add this new method:

     /// <summary>
     /// The xml file might have additional xsd specified in schema location
     /// Thes method parses the xml file to get the schema location attributes
     /// If the schema location of an xsd exist in same folder as the xml file and if the
     /// xsd is not well known (www.arkivverket.no/standarder) then the
     /// schema is loaded before validation
     /// </summary>
     /// <param name="fullPathToXMLFile"></param>
     /// <param name="xsdResources"></param>
     /// <returns></returns>
     private Stream[] ImportAdditionalXsd(string fullPathToXMLFile, params Stream[] xsdResources)
     {
       string localPath = Path.GetDirectoryName(fullPathToXMLFile); // xsd must be located in same folder as XML file

       List<Stream> allXsdStreams = new List<Stream>();
       foreach (var stream in xsdResources) allXsdStreams.Add(stream);

       using (XmlTextReader reader = new XmlTextReader(fullPathToXMLFile))
       {
         XmlDocument doc = new XmlDocument();
         doc.Load(reader);
         reader.Close();
         XmlElement root = doc.DocumentElement;
         XmlNode schemaLocationAttribute = root.SelectSingleNode("//@*[local-name()='schemaLocation']");
         if (schemaLocationAttribute != null)
         {
           var listOfURI = schemaLocationAttribute.Value.Split(null);
           int uriOffset = 0;

           while (uriOffset < listOfURI.Length)
           {
             // ignore well known xsd's
             if (!listOfURI[uriOffset].Contains("www.arkivverket.no/standarder")
               && uriOffset + 1 < listOfURI.Length
               && !String.IsNullOrEmpty(listOfURI[uriOffset + 1]))
             {
               // add new stream if xsd exist
               string fullPathToXsdFile = Path.Combine(localPath, listOfURI[uriOffset + 1]);
               if (File.Exists(fullPathToXsdFile)) allXsdStreams.Add(File.OpenRead(fullPathToXsdFile));
             }

             uriOffset += 2;
           }
         }

         return allXsdStreams.ToArray();

       }
     }

Modify ValidateXml metod

private void ValidateXml(string fullPathToFile, Stream fileStream, params Stream[] xsdResources) { string fileName = Path.GetFileName(fullPathToFile);

        // Use the Noark 5 archive filename for testresults:
        if (fileName.Equals(ArkadeConstants.AddmlXmlFileName)) 
            fileName = ArkadeConstants.ArkivuttrekkXmlFileName;

         // need to load additional xsd based on info in schema location attribute
        **xsdResources = ImportAdditionalXsd(fullPathToFile, xsdResources);** 

        try
        {
            foreach (string validationErrorMessage in new XmlValidator().Validate(fileStream, xsdResources))
                _testResults.Add(new TestResult(ResultType.Error, new Location(fileName), validationErrorMessage));
        }
        catch (Exception e)
        {
            string message = string.Format(Noark5Messages.ExceptionDuringXmlValidation, fileName, e.Message);
            throw new ArkadeException(message, e);
        }
    }
pernyga commented 6 years ago

Actually, you could just solve this more elegant just by adding a new flag to the validator settings:

settings.ValidationFlags |= XmlSchemaValidationFlags.ProcessSchemaLocation;

Assuming this xml snippet

  xsi:schemaLocation="http://www.cbrain.dk/F2/Noark http://schema.cbrain.net/F2_Noark.xsd
                      http://www.arkivverket.no/standarder/noark5/arkivstruktur arkivstruktur.xsd
                      http://www.arkivverket.no/standarder/noark5/metadatakatalog metadatakatalog.xsd"

and by not statically loading the xsd as a stream to the validate method, then this solution would work for local available schemas as well as downloadable schemas.

joergen-vs commented 5 years ago

All associations between xml and xml-schemas is supposed to be connected in the addml-file.

Shortened xml

          <dataObject name="arkivstruktur">
            <properties>
              <property name="file">
                <properties>
                  <property name="name">
                    <value>arkivstruktur.xml</value>
                  </property>
                </properties>
              </property>
              <property name="schema">
                <value>main</value>
                <properties>
                  <property name="file">
                    <properties>
                      <property name="name">
                        <value>arkivstruktur.xsd</value>
                      </property>
                    </properties>
                  </property>
                </properties>
              </property>
              <property name="schema">
                <properties>
                  <property name="file">
                    <properties>
                      <property name="name">
                        <value>metadatakatalog.xsd</value>
                      </property>
                    </properties>
                  </property>
                </properties>
              </property>
            </properties>
          </dataObject>

Or more simplified

dataObject of xml-file
    file-properties
        "relative filepath"
    schema-properties+
        file-properties
            "relative filepath"

Arkade5 has so far based the validation on hardcoded values. It will soon change to abide by the addml.

Did this explain it?

pernyga commented 5 years ago

That would work also ;-)

erikaaberg commented 2 years ago

Saken løst - ref. "Arkade5 has so far based the validation on hardcoded values. It will soon change to abide by the addml. Did this explain it?"