dnnsoftware / Dnn.Platform

DNN (formerly DotNetNuke) is the leading open source web content management platform (CMS) in the Microsoft ecosystem.
https://dnncommunity.org/
MIT License
1.03k stars 749 forks source link

Sitemap.aspx is not longer accepted by Google Search Console #3277

Open wbonekamp opened 4 years ago

wbonekamp commented 4 years ago

Open Google Search console and add new sitemap: domain.com/sitemap.aspx Error: can't fetch (type unknown)

url sitemap.aspx is available in browser ans is a valid xml format Tested the url in several sitemap validators and all pass

When you save the sitemap.aspx in sitemap.xml the sitemap is accepted and show the indexed pages

Steps to reproduce

List the steps to reproduce the behavior:

  1. Open Google Search console
  2. Add sitemap
  3. domain.com/sitemap.aspx
  4. Save

Error log

can't fetch (unknown type)

Affected version

DNN 9.2.2 DNN 9.3.2

thabaum commented 4 years ago

can this be saved as an XML instead of aspx I believe that is the issue here as it has been a long time problem for versions ever since it was created in DNN as far back as I can remember. This would be a good one to resolve.

The format inside the file I believe is just like an XML file. I have put a few issues in as well in the past about this.

valadas commented 4 years ago

Hmm, I never had an issue submitting sitemap.aspx not sure if google changed something recently I don't have any new site right now to submit but all my existing sites are using sitemap.aspx and get refreshed correctly.

thabaum commented 4 years ago

I believe it works with the google webmaster and bing webmaster sites. but never worked for me using DNN

wbonekamp commented 4 years ago

Try a new DNN9 website and add your sitemap.aspx in Google Search Console. You wil see it's not accepted.

And yes sitemaps that were added in the past keep working on sitemap.aspx. The problem only exist for new websites.

valadas commented 4 years ago

I was able to reproduce, looks like there was a recent change/bug with google search console, I would suggest upvoting https://support.google.com/webmasters/thread/14503265?hl=en

This was discussed in https://github.com/dnnsoftware/Dnn.Platform/issues/2863 and it includes a way to make it work using the .xml extension but using a url rewrite module that would create a new environment dependency on Dnn and it was agreed to not implement this in the core. In that same issue it was discussed that having dynamic sitemaps with .aspx extension respects industry standards, so ultimately this is currently a google search console bug.

Is there another way to have something similar to a url rewrite without the IIS UrlRewrite module? Any possible solution on our side?

EPTamminga commented 4 years ago

@thabaum The .aspx/xml change can be handled by a rule in the web.config.

Any request for the sitemap.xml will be handled by sitemap.aspx

<rewrite>
    <rules>
        <rule name="SiteMap"
         enabled="true"
         patternSyntax="Wildcard"
         stopProcessing="true">
            <match url="*sitemap.xml" />
            <action type="Rewrite"
              url="{R:1}sitemap.aspx"
              appendQueryString="true" />
        </rule>
    </rules>
</rewrite>
valadas commented 4 years ago

@thabaum

The format inside the file I believe is just like an XML file.

Yes not only the format but the mime-type too. I think what happens is that google now supports multiple sitemap formats and wrongly relies on the file name instead of the returned mime type to decide the type of sitemap.

valadas commented 4 years ago

For those who know more about it than me, could url routing (like used for service routes) be used in a similar fashion to point requests for SiteMap.xml to SiteMap.aspx transparently?

Something along the lines of:

RegisterRoutes(RouteCollection routes)
{
    routes.MapPageRoute(
        "sitemap",
        "sitemap.xml",
        "~/sitemap.aspx"
    );
}
trouble2 commented 4 years ago

I know Sacha Trauwaen made the open url rewriter module which (among some other nice additions) took care of this: https://github.com/sachatrauwaen/OpenUrlRewriter

See here for all the nice additions: SEO clean urls (http://www.mysite.com/en/mypage/myarticle) no duplicate content exlude non relevant content from search engines clean and complete sitemap also for multi-language

Site quality no broken links page not found 404 and redirections 301, 302 detection

URL Rewriter Rewriting of language part of the url (www.mysite.com/fr/...) Rewriting the page part of the url independent of the page name Rewriting for module parameters Rewriting modules without pagename Rewriting of all urls of home page to potal alias Removing file extension

URL Redirector Hold all the history of url’s for redirecting No automatic url change on page name change Making automatic redirections of the old dnn urls Making end user custom redirections

URL Analyser Log url for trouble shooting URL Sitemap generator sitemap generation sitemap with alternate links for multilanguage rewriting of sitemap.xml to sitemap.aspx

Meta data generator Meta Robots replacing (for login, register, terms, privacy pages) Exluding site from Google during development Annotation rel="alternate" hreflang="x"

mikesmeltzer commented 4 years ago

@valadas I'm not sure if routing would work without trying it but this could be done using a URL Provider extension. I'm going to be doing something similar soon for a portal level robots.txt file. I'll take a look at this if this is still open when I look at the robots and see if it would work in a similar way.

wbonekamp commented 4 years ago

Rewrite is a workaround option but I hoped it could be fixed in DNN core.

I assume many DNN users have this problem and the sitemap functionality is quite essential to get your DNN website indexed the right way

dieterdtx commented 4 years ago

I only just now found this thread, I was the one posting this on the Google Forum (since GSC does not have a real support). I will update my question there with the remark about file extension.

Thank you @EPTamminga for the rewrite solution/workaround.

EdwardGraham commented 4 years ago

Will this:

https://github.com/dnnsoftware/Dnn.Platform/issues/3277#issuecomment-555410133

create an .xml file and if so where do I place it in web.config. please forgive me but I am a newbie

EPTamminga commented 4 years ago

@EdwardGraham See my comment with my instructions placed here on 19 nov.

EdwardGraham commented 4 years ago

I saw that and my question was where to put it in the web.config

EPTamminga commented 4 years ago

In the section.

Just google on “how to add redirect rule in web config” and you will find several instructions.

EdwardGraham commented 4 years ago

EPT Thank you it goes right under the and seems to be working now off to google to try and register the site Thank you

stale[bot] commented 4 years ago

We have detected this issue has not had any activity during the last 90 days. That could mean this issue is no longer relevant and/or nobody has found the necessary time to address the issue. We are trying to keep the list of open issues limited to those issues that are relevant to the majority and to close the ones that have become 'stale' (inactive). If no further activity is detected within the next 14 days, the issue will be closed automatically. If new comments are are posted and/or a solution (pull request) is submitted for review that references this issue, the issue will not be closed. Closed issues can be reopened at any time in the future. Please remember those participating in this open source project are volunteers trying to help others and creating a better DNN Platform for all. Thank you for your continued involvement and contributions!

mikesmeltzer commented 4 years ago

While there is a workaround, given that Google Search Console is quite popular and a lot of solutions use .xml sitemaps, I think we should look at adding .xml support out of the box for this.

Please go ahead and assign it to me, I will take a look at a potential resolution for it in the coming weeks that doesn't require new core dependencies.

EdwardGraham commented 4 years ago

Thank you!

-----Original Message----- From: Mike Smeltzer notifications@github.com To: dnnsoftware/Dnn.Platform Dnn.Platform@noreply.github.com Cc: EdwardGraham daleg@aol.com; Mention mention@noreply.github.com Sent: Tue, Mar 24, 2020 12:48 pm Subject: Re: [dnnsoftware/Dnn.Platform] Sitemap.aspx is not longer accepted by Google Search Console (#3277)

While there is a workaround, given that Google Search Console is quite popular and a lot of solutions use .xml sitemaps, I think we should look at adding .xml support out of the box for this.Please go ahead and assign it to me, I will take a look at a potential resolution for it in the coming weeks that doesn't require new core dependencies.— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

valadas commented 4 years ago

Awesome @mikesmeltzer Assigned to you and triaged.

wbonekamp commented 4 years ago

Looks like Google Search Console does accept sitemap.aspx again

mikesmeltzer commented 4 years ago

Looks like Google Search Console does accept sitemap.aspx again

Thanks for the info, glad this is no longer an impediment.

andyduo commented 4 years ago

Yes I was just successful in submitting the sitemap as well

valadas commented 4 years ago

Awesome, so was this a google bug and we close this issue or do we keep this because someone wanted to still implement a solution?

Timo-Breumelhof commented 4 years ago

IMO it would not be bad to server with with an XML extension, to prevent future issues

EPTamminga commented 4 years ago

We could add a standard rule in de web.config that is used in the install pack

thabaum commented 4 years ago

I think the platform should serve both file types... I think it did in the past or I have had this in my dreams as a solution for a long time... So if you type Sitemap.xml or Sitemap.aspx it would show the same file since... they are exactly the same.

Would creating a second alias URL one for each of the two file types so users can type the sitemap URL either way and the XML sitemap content is loaded regardless of which URL file type is used work as a solution?

valadas commented 4 years ago

Well, it is a bit more complex than an alias, xml is a static file that does not go through the normal pipeline of dynamic files. So we would need some kind of url handler to catch that url and pipe it through some dynamic handling code.

mitchelsellers commented 4 years ago

I personally would not support the addition of an additional alias to this file. It should have one name, if we want to change to sitemap.xml we can, but that would be a breaking change for others. There is no impact with regards to it being sitemap.aspx.

Timo-Breumelhof commented 4 years ago

I personally would not support the addition of an additional alias to this file. It should have one name, if we want to change to sitemap.xml we can, but that would be a breaking change for others. There is no impact with regards to it being sitemap.aspx.

True, but Google expects the xml extension apparently (not completely unreasonable IMO). So I think it would be good to avoid issue in the future in case Google would change their "policy" again?

EPTamminga commented 4 years ago

A rule in web.config solves this case? No need to change sitemap.aspx?

mitchelsellers commented 4 years ago

Per the guidelines in google here https://support.google.com/webmasters/answer/183668?hl=en

There is not a requirement to have it named .xml, the format has to be one of many supported but the name isn’t noted as a requirement.

Adding a redirect to web.config depends on an optional IIS component and could be a breaking change. It also adds a second redirect manager to all requests

rodrigoratan commented 3 years ago

If the .xml extension is really needed, why not put a button that generates the file? and if it already exists, warns the user and asks if he wants to replace

jasclar commented 3 years ago

why not just add "Sitemap: https://www.exampledomainhere.com/sitemap.aspx" into your robots.txt ?

dieterdtx commented 3 years ago

why not just add "Sitemap: https://www.exampledomainhere.com/sitemap.aspx" into your robots.txt ?

There was a period (a couple of months) where Google Search Console did not accept the .aspx extension. In the end of March 2020 and onward GSC accepted it again.