dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.58k stars 1.46k forks source link

[docs] - Very hard to search and understand search results on Dagster Docs site #3900

Open kinghuang opened 3 years ago

kinghuang commented 3 years ago

Use Case

I find it really hard to search for things and discover/understand what the search results are offering when using the Dagster Docs site quick search. The general text search across all the docs delivers poor search results.

Example 1: partitions

Search Term Intent
partitions I would like a refresher on how to use pipeline partitions.

Results:

  1. Partitions: API docs for the Partitions.
  2. Partitions #: Conceptual docs for Partitions.
  3. Defining a Partition Set #: A subsection of search result 2.
  4. Partitions in Dagit #: A subsection of search result 2.
  5. The Partitions Tab #: A subsection of search result 4.

Screenshot:

partitions

Example 2: PartitionSetDefinition

Search Term Intent
PartitionSetDefinition I am looking for documentation on the PartitionSetDefintion class.

Results:

  1. List[Union[PipelineDefinition, PartitionSetDefinition, …: The arguments to the dagster.repository decorator.
  2. You can define a PartitionSetDefinition…: Fragment from Partitions concept page.
  3. Partition-based schedules generate…: Fragment from Schedules concept page.
  4. … string that's supplied to PartitionSetDefinition…: Description from dagster.create_offset_partition_selector.
  5. … PipelineDefinition, ScheduleDefinition, SensorDefinition…: Fragment from Repositories concept page.

Screenshot:

partitionsetdefinition

Example 3: materializer

Search Term Intent
materializer I want to implement a type materializer.

Results:

  1. materializer ( Optional [ DagsterTypeMaterializer] )…: An argument of dagster_pandas.create_dagster_pandas_dataframe_type.
  2. materializer ( Optional [ DagsterTypeMaterializer] )…: An argument of dagster.DagsterType.
  3. … solid logic. Dagster calls this facility…: A snippet of Custom Materializing Data Types.
  4. … on solids, on loaders and materializers…: A description of dagster.Field.
  5. Launch a run whenever another pipeline…#: From Sensors overview.

Screenshot:

materializer

Ideas of Implementation

In no particular order:

Additional Info

This was a problem on the old docs site, too.


Message from the maintainers:

Excited about this feature? Give it a :thumbsup:. We factor engagement into prioritization.

schrockn commented 3 years ago

I agree: https://github.com/dagster-io/dagster/issues/3899

helloworld commented 3 years ago

@kinghuang this is amazing feedback! I appreciate the amount of detail here–it makes it really easy to iterate and improve search.

I just pushed a change that implements most of your #1 and #3 implementation ideas. Let me know what you think. Will also continue making improvements over the rest of the day. Would you be up for doing another pass early next week?

What's clear is that within the API docs, we need to prioritize ranking classes and functions matches above argument name matches.

Here's what searching for "Partitions" gives now:

"PartitionSetDefinition":

"Materializer":

kinghuang commented 3 years ago

Awesome. That was fast! 😄

I just tried it out. The groupings and the replacement of the former "Section" text are huge improvements. I agree that priority ranking of the API results by a hierarchy like module, class, function/method, attribute would be great. The same idea would probably help the other areas, too (e.g., headings then paragraphs).

I'll be heavy on Dagster related stuff at work this week and next week. So, I'm definitely up for another pass on the docs next week. Thanks!