dijs / wiki

Wikipedia Interface for Node.js
MIT License
315 stars 61 forks source link

How to get Wikipedia content using Wikipedia's URL? #146

Closed Prottoy2938 closed 3 years ago

Prottoy2938 commented 3 years ago

Is it possible to create an input field where you can paste a Wikipedia page link and it will get all the text contents from that page?

I'm trying to integrate a feature on my web application where people can paste their Wikipedia page link/URL they want to analyze on the input field. And the application will use that URL to get all the text content from that page.

Suppose the user inputs this link: https://en.wikipedia.org/wiki/Taylor_Swift

The application will return the text content of that page, like this:

Taylor Alison Swift (born December 13, 1989) is an American singer-songwriter. Her narrative songwriting, which often centers around her personal life, has received widespread media coverage. Born in West Reading, Pennsylvania, Swift relocated to Nashville, Tennessee in 2004 to pursue a career in country music. At age 14, she became the youngest artist signed by the Sony/ATV Music publishing house, and at age 15, she signed her first record deal. Her 2006 eponymous debut studio album was the longest-charting album of the 2000s on the Billboard 200. Its third single, "Our Song", made her the youngest .......

I've gone through Wikipedia API and found none (yet). Any suggestions on how I do this?

kieckhafer commented 2 years ago

Came across your issue while searching for some answers to something else.

Not sure if you've figured this out yet, but here's how I achieve this in my React app:

  const [image, setImage] = useState(null);

useEffect(() => {
  const wikipediaLink = "https://en.wikipedia.org/wiki/Page_Name";

  const pageTitle = wikipediaLink.replace(
    'https://en.wikipedia.org/wiki/',
    ''
  );

      if (pageTitle) {
        wiki({ apiUrl: 'https://en.wikipedia.org/w/api.php' }) // default is http, need to add this to call to https
          .page(pageTitle)
          .then(async (page) => {
            // you can log page here to see all the available functions to retrieve what you need
            console.log('page); 

            const mainImage = await page.mainImage();
            setImage(mainImage);
          })
          .catch((e) => {});
      }
})
dijs commented 2 years ago

@kieckhafer Thanks for answering this.

A couple things to add here:

  1. You do not need to use the apiUrl property if you are just using the english version of Wikipedia.
  2. If you would like to see all the available functions on the page object, you can view them on the docs https://dijs.github.io/wiki/
kieckhafer commented 2 years ago

Thanks @dijs. The reason I had to add the apiUrl was to get https to work. I was getting an error between my site and the API because my site is HTTPS and the defaultOptions is set to http.

I created PR here to update to use // as the protocol, I think this would fix the issue I was seeing and not force me to pass in the apiUrl param: https://github.com/dijs/wiki/pull/161