JeffFerguson / gepsio

Gepsio is a document object model for XBRL documents based on .NET 6.
http://gepsio.wordpress.com
MIT License
68 stars 25 forks source link

Constructing `LinkbaseDocument` fails when `DocumentPath` is an absolute URI #28

Open jfoshee opened 5 years ago

jfoshee commented 5 years ago

Repro Steps

// Load Campbell Soup 2019 10-K
new XbrlDocument()
  .Load("https://www.sec.gov/Archives/edgar/data/16732/000001673219000070/cpb-20190728.xml");

Symptoms

System.Net.WebException : The remote server returned an error: (404) Not Found.

StackTrace at System.Net.HttpWebRequest.GetResponse() at System.Xml.XmlDownloadManager.GetNonFileStream(Uri uri, ICredentials credentials, IWebProxy proxy, RequestCachePolicy cachePolicy) at System.Xml.XmlDownloadManager.GetStream(Uri uri, ICredentials credentials, IWebProxy proxy, RequestCachePolicy cachePolicy) at System.Xml.XmlUrlResolver.GetEntity(Uri absoluteUri, String role, Type ofObjectToReturn) at System.Xml.XmlTextReaderImpl.OpenUrl() at System.Xml.XmlTextReaderImpl.Read() at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace) at System.Xml.XmlDocument.Load(XmlReader reader) at System.Xml.XmlDocument.Load(String filename) at JeffFerguson.Gepsio.Xml.Implementation.SystemXml.Document.Load(String path) in D:\gepsio\JeffFerguson.Gepsio\Xml\Implementations\SystemXml\Document.cs:line 29 at JeffFerguson.Gepsio.LinkbaseDocument..ctor(String ContainingDocumentUri, String DocumentPath) in D:\gepsio\JeffFerguson.Gepsio\LinkbaseDocument.cs:line 25 at JeffFerguson.Gepsio.DefinitionLinkbaseDocument..ctor(String ContainingDocumentUri, String DocumentPath) in D:\gepsio\JeffFerguson.Gepsio\DefinitionLinkbaseDocument.cs:line 17 at JeffFerguson.Gepsio.LinkbaseDocumentCollection.ReadLinkbaseReference(String ContainingDocumentUri, INode LinkbaseReferenceNode) in D:\gepsio\JeffFerguson.Gepsio\LinkbaseDocumentCollection.cs:line 146 at JeffFerguson.Gepsio.LinkbaseDocumentCollection.ReadLinkbaseReferences(String ContainingDocumentUri, INode parentNode) in D:\gepsio\JeffFerguson.Gepsio\LinkbaseDocumentCollection.cs:line 133 at JeffFerguson.Gepsio.XbrlSchema.ReadAppInfo(INode AppInfoNode) in D:\gepsio\JeffFerguson.Gepsio\XbrlSchema.cs:line 456 at JeffFerguson.Gepsio.XbrlSchema.ReadAnnotations(INode AnnotationNode) in D:\gepsio\JeffFerguson.Gepsio\XbrlSchema.cs:line 448 at JeffFerguson.Gepsio.XbrlSchema.LookForAnnotations() in D:\gepsio\JeffFerguson.Gepsio\XbrlSchema.cs:line 437 at JeffFerguson.Gepsio.XbrlSchema..ctor(XbrlFragment ContainingXbrlFragment, String SchemaFilename, String BaseDirectory) in D:\gepsio\JeffFerguson.Gepsio\XbrlSchema.cs:line 217 at JeffFerguson.Gepsio.XbrlSchemaCollection.GetSchemaFromTargetNamespace(String targetNamespace, XbrlFragment parentFragment) in D:\gepsio\JeffFerguson.Gepsio\XbrlSchemaCollection.cs:line 311 at JeffFerguson.Gepsio.Item.GetSchemaElementFromSchema() in D:\gepsio\JeffFerguson.Gepsio\Item.cs:line 235 at JeffFerguson.Gepsio.Item..ctor(XbrlFragment ParentFragment, INode ItemNode) in D:\gepsio\JeffFerguson.Gepsio\Item.cs:line 133 at JeffFerguson.Gepsio.Fact.Create(XbrlFragment ParentFragment, INode FactNode) in D:\gepsio\JeffFerguson.Gepsio\Fact.cs:line 52 at JeffFerguson.Gepsio.XbrlFragment.ReadFacts() in D:\gepsio\JeffFerguson.Gepsio\XbrlFragment.cs:line 515 at JeffFerguson.Gepsio.XbrlFragment..ctor(XbrlDocument ParentDocument, INamespaceManager namespaceManager, INode XbrlRootNode) in D:\gepsio\JeffFerguson.Gepsio\XbrlFragment.cs:line 169 at JeffFerguson.Gepsio.XbrlDocument.Parse(IDocument doc) in D:\gepsio\JeffFerguson.Gepsio\XbrlDocument.cs:line 275 at JeffFerguson.Gepsio.XbrlDocument.Load(String Filename) in D:\gepsio\JeffFerguson.Gepsio\XbrlDocument.cs:line 179

Analysis

The problem arises because an invalid Linkbase Path is constructed by concatenating two absolute URIs. This leads to the 404.

We can see where things start to go wrong in this call to build a collection of linkbase references. Note the xlink:href in the 2nd one is absolute.

LinkbaseDocumentCollection.ReadLinkbaseReferences(
    string ContainingDocumentUri = "http://xbrl.fasb.org/us-gaap/2018/elts/us-gaap-2018-01-31.xsd", 
    INode parentNode = { ChildNodes = [
        <link:linkbaseRef xlink:arcrole="http://www.w3.org/1999/xlink/properties/linkbase" 
                        xlink:role="http://www.xbrl.org/2003/role/definitionLinkbaseRef" 
                        xlink:type="simple" 
                        xlink:href="../elts/us-gaap-eedm-def-2018-01-31.xml" 
                        xmlns:xlink="http://www.w3.org/1999/xlink" 
                        xmlns:link="http://www.xbrl.org/2003/linkbase" />,
        <link:linkbaseRef xlink:arcrole="http://www.w3.org/1999/xlink/properties/linkbase" 
                        xlink:role="http://www.xbrl.org/2003/role/definitionLinkbaseRef" 
                        xlink:type="simple" 
                        xlink:href="http://xbrl.fasb.org/srt/2018/elts/srt-eedm1-def-2018-01-31.xml" 
                        xmlns:xlink="http://www.w3.org/1999/xlink" 
                        xmlns:link="http://www.xbrl.org/2003/linkbase" />
    ]  } )

This calls into:

private void ReadLinkbaseReference(
    string ContainingDocumentUri = "http://xbrl.fasb.org/us-gaap/2018/elts/us-gaap-2018-01-31.xsd", 
    INode LinkbaseReferenceNode = ...)

        xlinkNode.Href = "http://xbrl.fasb.org/srt/2018/elts/srt-eedm1-def-2018-01-31.xml"

private string GetFullLinkbasePath(string ContainingDocumentUri, string LinkbaseDocFilename)
    LinkbaseDocFilename = "http://xbrl.fasb.org/srt/2018/elts/srt-eedm1-def-2018-01-31.xml"
    DocumentPath = "http://xbrl.fasb.org/us-gaap/2018/elts/"
    // Constructs an invalid path:
            FullPath = DocumentPath + LinkbaseDocFilename;

Then the exception is thrown in the LinkbaseDocument constructor:

internal LinkbaseDocument(string ContainingDocumentUri, string DocumentPath)
    thisLinkbasePath = "http://xbrl.fasb.org/us-gaap/2018/elts/http://xbrl.fasb.org/srt/2018/elts/srt-eedm1-def-2018-01-31.xml"
    // Fails:
    thisXmlDocument.Load(thisLinkbasePath);

I am new to XBRL and Gepsio, so I'm uncertain where best to make the fix. However it does seem clear that the concatenation shouldn't happen between two absolute URIs. I suspect that GetFullLinkbasePath should check if the LinkbaseDocFilename is an absolute URI and simply return that if it is.

I will continue to investigate and submit a pull request if I can get it to work. Any direction is welcome.

jfoshee commented 5 years ago

Now I see this issue was resolved in the develop branch with commit e9bf15aacca5a002c891a35b3e417835f83eb52f

Anything preventing this from merging to master and being released? This should close this issue and #22.

I do have a coding question: The use of System.Uri seems conspicuously absent from GetFullLinkbasePath(). Is there a reason to hand-parse URIs as opposed to using System.Uri? For example, the code will break again as more schemas are moved to https.

An alternative to

if (LinkbaseDocFilename.StartsWith("http://") == true)

may be

if (Uri.IsWellFormedUriString(LinkbaseDocFilename, UriKind.Absolute))

Similarly it seems strange to use System.IO.Path.DirectorySeparatorChar as that will cause the code to work differently on *nix versus Windows for the same xml.