dotansimha / graphql-code-generator

A tool for generating code based on a GraphQL schema and GraphQL operations (query/mutation/subscription), with flexible support for custom plugins.
https://the-guild.dev/graphql/codegen/
MIT License
10.88k stars 1.34k forks source link

[typescript] Add the ability to only output types used by the queries #6959

Open jerelmiller opened 3 years ago

jerelmiller commented 3 years ago

Hey there πŸ‘‹

First of all, thanks so much for this amazing library. I've really enjoyed how flexible and configurable this tool is.

I'd love to add the ability to generate only the TypeScript types used by the schema. Currently the entire GraphQL schema + query/mutation types (with typescript-operations) are generated and saved to the file. This works great for small to medium-size GraphQL APIs, but gets overwhelming for large GraphQL APIs.

I'm working with a very large GraphQL API to generate types for my UI. The UI I'm building uses a small fraction of the GraphQL schema, yet everything in the GraphQL schema is included in my types file. To give you a hard number, our types file has 132,000+ lines of code after codegen.

Because that file is so big, it has a heavy performance impact on a few of our tools such as prettier. We have some tooling in place to automatically apply prettier to files before a commit and we notice that this file eats up a huge chunk of time. Because we also lint our commits, there are times when a minor commit message error can result in a > 10s feedback cycle due to it trying to run prettier on the file before linting the commit message.

As a workaround, we've considered adding this file to something like a .prettierignore (and equivalent for other tools), but we really appreciate the prettified code because its much easier to read in diffs for PRs and such. Ideally we'd be able to leverage everything we have right now, but with a much more scaled down file to do so. This would also make it easy for someone to scan that file and understand what is actually used in our UI. As a human reading that file, that file is too big for anyone to fully grasp anything in there.

I'd love for this to be considered as a new configuration option that can be enabled. Thanks!

dotansimha commented 3 years ago

At the moment, you can use preResolveType: true onlyOperationTypes: true to reduce the amount of code/types generated.

myknbani commented 1 year ago

Seems to be on the roadmap:

Only generate actually used types

IntusFacultas commented 8 months ago

For what it's worth, I've managed to implement this functionality as a post-generation cleanup script. I can't take the time to create an MR but I will share my code here, as documented as possible. I welcome anyone to adapt into into a functional MR for this library :)

Assumptions

  1. Your local operation files are defined as *.graphql files
  2. All your operations are named
  3. You've exported where your operation files are located from your codegen configuration file.
  4. You're leveraging the __typename field and the value of it matches the type names generated

Overall Approach

  1. Generate one giant file with all your types and enums for the entire graph schema.
  2. Collect and parse the graphql files to get all your operation names
  3. Parse the giant file into an AST
  4. For each of your operations, recursively collect all the types and enums they use (let's call this DepsA)
  5. For each of the types collected (excluding Query and Mutation since those have all types as dependencies by definition), recursively collect all the types and enums they use (so you can generate a valid type definition for them all) (lets call this DepsB, which is a superset of DepsA)
  6. Delete from the giant file all top level types and enums not included in DepsB.
  7. (Optional, but if you want minimal size you need this) For the remaining types, delete any property that references a type not explicitly mentioned in DepsA

Code

constants.ts

export const BASE_TYPES_TO_PRESERVE = ['Maybe', 'InputMaybe', 'Exact', 'MakeOptional', 'MakeMaybe', 'Scalars'];
export const BASE_QUERY_TYPE = 'Query';
export const BASE_MUTATION_TYPE = 'Mutation';

cullUnusedTypes.ts

This script does all the work for paring down the types, leveraging all the various util files I defined.

/**
 * Context: https://github.com/dotansimha/graphql-code-generator/issues/6959
 *
 * Graphql Codegen is great but it can't be configured to only resolve the types and enums that we
 * are personally using so it generates all of our Graph's contract, which then gets bundled into our
 * base bundle and is pretty large. So we try to remove as much code as possible
 */

import { findAssociatedTypes } from './utils/findAssociatedTypes';
import { getAllGraphQLFiles } from './utils/getAllGraphQLFiles';
import { getOperationNameFromFile } from './utils/getOperationNameFromFile';
import { getUnusedGeneratedTypes } from './utils/getUnusedGeneratedTypes';
import { pruneGeneratedTypes } from './utils/pruneGeneratedTypes';
import { BASE_TYPES_TO_PRESERVE, BASE_QUERY_TYPE, BASE_MUTATION_TYPE } from './constants';

const BASE_QUERY_TYPE = 'Query';
const BASE_MUTATION_TYPE = 'Mutation';
export const cullUnusedTypes = () => {
    const allGraphQLFiles = getAllGraphQLFiles();
    const operations = allGraphQLFiles.map(file => getOperationNameFromFile(file));
    const allTypesDirectlyReferenced = operations.flatMap(operation =>
        Array.from(
            new Set([
                ...findAssociatedTypes(`${operation}Query`),
                ...findAssociatedTypes(`${operation}QueryResult`),
                ...findAssociatedTypes(`${operation}QueryHookResult`),
                ...findAssociatedTypes(`${operation}LazyQueryHookResult`),
                ...findAssociatedTypes(`${operation}QueryVariables`),
                ...findAssociatedTypes(`${operation}Mutation`),
                ...findAssociatedTypes(`${operation}MutationOptions`),
                ...findAssociatedTypes(`${operation}MutationFn`),
                ...findAssociatedTypes(`${operation}MutationHookResult`),
            ])
        )
    );
    const fullSetOfTypes = new Set<string>([BASE_QUERY_TYPE, BASE_MUTATION_TYPE, ...allTypesDirectlyReferenced]);

    // We don't recurse through Query and Mutation, since they are defined as the superset of the entire graph schema,
    // so if we recursed through them, we'd just bring every single type straight back in.
    const typesToRecurse = [...allTypesDirectlyReferenced].filter(
        type => type !== BASE_QUERY_TYPE && type !== BASE_MUTATION_TYPE
    );

    while (typesToRecurse.length) {
        const typeToRecurse = typesToRecurse.pop()!;

        const additionalTypes = findAssociatedTypes(typeToRecurse);
        const newTypes = additionalTypes.filter(
            potentiallyNewType => !fullSetOfTypes.has(potentiallyNewType) && potentiallyNewType !== typeToRecurse
        );
        additionalTypes.forEach(additionalType => fullSetOfTypes.add(additionalType));
        typesToRecurse.push(...newTypes);
    }
    const allUnusedTypes = getUnusedGeneratedTypes(
        Array.from(new Set([...BASE_TYPES_TO_PRESERVE, ...Array.from(fullSetOfTypes)]))
    );
    pruneGeneratedTypes(allUnusedTypes, allTypesDirectlyReferenced);
};

cullUnusedTypes();

findAssociatedTypes.ts

This function takes in a type name, finds the type declaration for that type name, then recurses through the the type declaration, consuming all the __typename fields to find what other types need to be preserved

import * as ts from 'typescript';
import { loadGeneratedTypes } from './loadGeneratedTypes';

const EXPLICITLY_IGNORED_NODE_TYPES = [
    ts.isTypeParameterDeclaration,
    ts.isIndexedAccessTypeNode,
    ts.isPropertyDeclaration,
    ts.isVoidExpression,
    ts.isLiteralTypeNode,
];

/**
 * Interpolates a type definition's AST to find all the relevant types for the query
 */
const recurseThroughNodesAndCollectTypes = (node: ts.Node | undefined, types: string[]): string[] => {
    if (!node || EXPLICITLY_IGNORED_NODE_TYPES.some(typeguard => typeguard(node))) {
        return types;
    }
    if (ts.isPropertySignature(node) && ts.isIdentifier(node.name) && node.name.text === '__typename') {
        /**
         * Easy way to extract a requested type given that __typename maps to another type somewhere
         */
        const type = node.type as ts.LiteralTypeNode;
        return [...types, (type.literal as ts.LiteralExpression).text];
    }
    if (ts.isTypeLiteralNode(node)) {
        /**
         * Recurse through type definition to grab all the properties and determine what nested types are used in situations
         * like
         *
         * type Foo = {
         *      __typename: 'Something',
         *      .. other members
         * }
         */
        return node.members.reduce(
            (acc, childNode) => Array.from(new Set([...acc, ...recurseThroughNodesAndCollectTypes(childNode, types)])),
            types
        );
    }
    if (ts.isUnionTypeNode(node!) || ts.isIntersectionTypeNode(node)) {
        /**
         * Recurse through the individual members of a top level union or intersection type definition, like Hello and World
         * in this example
         *
         * type MyType = Hello & World;
         */
        return node!.types.reduce(
            (acc, childNode) => Array.from(new Set([...acc, ...recurseThroughNodesAndCollectTypes(childNode, types)])),
            types
        );
    }
    if (ts.isTypeReferenceNode(node) && node.typeArguments) {
        /**
         * To be able to extract SomeName from situations like the following
         * {
         *      someProperty: Array<Array<{
         *          __typename: "SomeName";
         *          someOtherStuff: unkonwn;
         *      }>>
         * }
         */
        return node.typeArguments.reduce(
            (acc, childNode) => Array.from(new Set([...acc, ...recurseThroughNodesAndCollectTypes(childNode, types)])),
            types
        );
    }
    if (ts.isTypeReferenceNode(node) && ts.isIdentifier(node.typeName)) {
        /**
         * Handle type references to extract Hello World from situations like this
         *
         * type MyType = Hello & World;
         */
        return [...types, node.typeName.text];
    }
    if (ts.isPropertySignature(node)) {
        return recurseThroughNodesAndCollectTypes(node.type, types);
    }
    return types;
};

/**
 * Parses the generated types from Codegen to find all the types that are relevant for a given
 * operation or type.
 */
export const findAssociatedTypes = (operationOrTypeName: string) => {
    let typeNode: ts.TypeAliasDeclaration | ts.EnumDeclaration;
    ts.forEachChild(loadGeneratedTypes()!, node => {
        const isMatchingTypeDeclaration = ts.isTypeAliasDeclaration(node) && node.name.text === operationOrTypeName;
        const isMatchingEnumDeclaration = ts.isEnumDeclaration(node) && node.name.text === operationOrTypeName;
        if (isMatchingTypeDeclaration || isMatchingEnumDeclaration) {
            typeNode = node;
            // short circuit execution
            return true;
        }
    });
    if (!typeNode!) {
        return [operationOrTypeName];
    }
    if (ts.isEnumDeclaration(typeNode!)) {
        return [operationOrTypeName, typeNode.name.text];
    }
    const relevantTypes: string[] = [operationOrTypeName, ...recurseThroughNodesAndCollectTypes(typeNode!.type, [])];
    return relevantTypes;
};

getAllGraphQLFiles.ts

This function takes the document locations that you defined in your codegen configuration file (in this case for me, the ../../../codegen), and collects all the graphql files at the locations specified. In my case I have hardcoded that the files end with *{graphql,ts}, but in practice we only use .graphql and I've not tested what would happen with a .ts file.

import fs from 'fs';
import path from 'path';
import { GRAPHQL_DOCUMENT_LOCATIONS } from '../../../codegen';

/**
 * Gets all the GraphQL files' resolved file paths defined for codegen to generate types on based on the exported
 * files configuration from the codegen config
 */
export const getAllGraphQLFiles = () => {
    /**
     * This part is brittle to the exact format we define the document locations as.
     */
    const directories = GRAPHQL_DOCUMENT_LOCATIONS.map(directory => directory.replace('*{graphql,ts}', ''));
    const files = directories.flatMap(directory => {
        const files = fs.readdirSync(directory);
        return files
            .filter(file => {
                const stat = fs.statSync(path.resolve(directory, file));
                return stat.isFile();
            })
            .map(file => path.resolve(directory, file));
    });
    return files;
};

getOperationNameFromFile.ts

This function loads a .graphql file into memory, parses it to a GQL AST and extracts the operation definition name so we can look up the relevant types for that operation

import gql from 'graphql-tag';
import { OperationDefinitionNode } from 'graphql';
import fs from 'fs';

/**
 * Parses the file at the given filePath into a GraphQL AST to extract the operation name defined
 * in the file. Presumes only one operation name defined per file
 */
export const getOperationNameFromFile = (filePath: string) => {
    const fileContent = fs.readFileSync(filePath);
    const graphqlAST = gql(fileContent.toString());
    const { definitions } = graphqlAST;
    /**
     * Presumption is that only one operation is defined per file.
     */
    const operationDefinition = definitions.find(definition => definition.kind === 'OperationDefinition');
    const { name } = operationDefinition as OperationDefinitionNode;
    const { value } = name!;
    return value;
};

getUnusedGeneratedTypes.ts

This function loads the giant generated file into memory, parses the TS AST, then recurses over every single top level enum declaration or alias declaration to collect which ones aren't in the allow list. Conveniently, this leaves all the generated hooks and documents untouched (which by definition we know are going to be used)

import * as ts from 'typescript';
import { loadGeneratedTypes } from './loadGeneratedTypes';

/**
 * Loads the generated types and collects all types and enums that are not in the used types passed in
 */
export const getUnusedGeneratedTypes = (usedTypes: string[]) => {
    const sourceFile = loadGeneratedTypes();
    const typesToPrune: string[] = [];
    ts.forEachChild(sourceFile, node => {
        if (
            (ts.isEnumDeclaration(node) && !usedTypes.includes(node.name.text)) ||
            (ts.isTypeAliasDeclaration(node) && !usedTypes.includes(node.name.text))
        ) {
            typesToPrune.push(node.name.text);
        }
    });
    return typesToPrune;
};

loadGeneratedTypes.ts

This function is just in charge of loading the giant generated types file into memory and parsing it into a TS AST.

import * as ts from 'typescript';
import path from 'path';
import { TYPES_FILE } from '../../../codegen';

declare global {
    // eslint-disable-next-line no-var
    var sourceFile: ts.SourceFile | undefined;
}
export const loadGeneratedTypes = () => {
    const file = path.resolve(TYPES_FILE);
    // some process level caching to avoid having to take really large files into memory over and over again
    const program = global.sourceFile || ts.createProgram([file], { allowJs: true });
    const sourceFile = global.sourceFile ?? program.getSourceFile(file);
    global.sourceFile = sourceFile;
    return sourceFile!;
};

pruneGeneratedTypes.ts

This is where we actually delete unused types and do the final optional step I mentioned of paring down existing type definitions to their minimally defined subset.

import * as ts from 'typescript';
import fs from 'fs';
import path from 'path';
import { BASE_TYPES_TO_PRESERVE } from '../../constants';
import { Terminal } from '../../../utils/io';
import { TYPES_FILE } from '../../../../codegen';
import { loadGeneratedTypes } from '../loadGeneratedTypes';

/**
 * Utilizes the passed in typesToKeep to strip out unused properties (if any) of the node.
 * If the node is entirely unusable, will return null. Mutates the node in place.
 */
export const getUtilizedSubsetOfNode = (node: ts.Node | null | undefined, typesToKeep: string[]): ts.Node | null => {
    if (!node) {
        return null;
    }

    /**
     * If it's an enum, we delete it if it isn't explicitly used
     */
    if (ts.isEnumDeclaration(node)) {
        if (!typesToKeep.includes(node.name.text)) {
            return null;
        }
        return node;
    }

    /**
     * If it's a type alias declaration, we pare down the type, and if we pare it down to nothing,
     * we delete the declaration
     */
    if (ts.isTypeAliasDeclaration(node)) {
        const paredDownTypeAlias = getUtilizedSubsetOfNode(node.type, typesToKeep);
        if (!paredDownTypeAlias) {
            return null;
        }
        Object.defineProperty(node, 'type', {
            configurable: true,
            value: paredDownTypeAlias,
        });
        return node;
    }
    /**
     * If it's a type literal node, we iterate through it's members to remove any unused members.
     */
    if (ts.isTypeLiteralNode(node)) {
        const usedMembers = node.members
            .map(childNode => getUtilizedSubsetOfNode(childNode, typesToKeep))
            .filter(childNode => !!childNode);
        Object.defineProperty(node, 'members', {
            configurable: true,
            value: usedMembers,
        });
        return node;
    }

    /**
     * It it's a union or intersection, we iterate through the unioned or intersected types and
     * remove any that are unused.
     */
    if (ts.isUnionTypeNode(node!) || ts.isIntersectionTypeNode(node)) {
        const usedTypes = node.types
            .map(childNode => getUtilizedSubsetOfNode(childNode, typesToKeep))
            .filter(childNode => !!childNode);

        if (!usedTypes.length) {
            return null;
        }
        Object.defineProperty(node, 'types', {
            configurable: true,
            value: usedTypes,
        });
        return node;
    }

    /**
     * If it's a type reference node with arguments, then it's going to be a Maybe, Exact, Array
     * etc. So we look inside the arguments to find the types and remove unused memebrs there.
     */
    if (ts.isTypeReferenceNode(node) && node.typeArguments) {
        const usedTypeArguments = node.typeArguments
            .map(childNode => getUtilizedSubsetOfNode(childNode, typesToKeep))
            .filter(childNode => !!childNode);

        if (!usedTypeArguments.length) {
            return null;
        }
        Object.defineProperty(node, 'types', {
            configurable: true,
            value: usedTypeArguments,
        });
        return node;
    }

    /**
     * If it's a type reference with a type name, then we look to see if it's a type name we
     * removed, to see if we can delete the node
     */
    if (ts.isTypeReferenceNode(node) && ts.isIdentifier(node.typeName)) {
        if (!typesToKeep.includes(node.typeName.text)) {
            return null;
        }
        return node;
    }

    /**
     * If it's a property signature, we delete it based on whether we can delete the type, otherwise
     * we update the type value to the minimum subset and return it
     */
    if (ts.isPropertySignature(node)) {
        const newType = getUtilizedSubsetOfNode(node.type, typesToKeep);
        if (!newType) {
            return null;
        }
        Object.defineProperty(node, 'type', {
            configurable: true,
            value: newType,
        });
        return node;
    }

    /**
     * If it's none of the above, we leave it be
     */
    return node;
};

/**
 * Side Effects: Mutates the generated types file to remove the types passed in as a parameter
 */
export const pruneGeneratedTypes = async (typesToRemove: string[], directlyUsedTypes: string[]) => {
    const sourceFile = loadGeneratedTypes();
    const nodesToKeep: ts.Node[] = [];
    const nodesToNotMutate: ts.Node[] = [];
    Terminal.info('\nDeleting unused enums and types');
    ts.forEachChild(sourceFile, node => {
        const isTypeOrEnum = ts.isEnumDeclaration(node) || ts.isTypeAliasDeclaration(node);
        if (isTypeOrEnum && BASE_TYPES_TO_PRESERVE.includes(node.name.text)) {
            nodesToNotMutate.push(node);
        } else if (isTypeOrEnum && !typesToRemove.includes(node.name.text)) {
            nodesToKeep.push(node);
        } else if (!isTypeOrEnum) {
            nodesToKeep.push(node);
        }
    });
    Terminal.success('βœ“ Unused enums and types deleted\n');
    const printer = ts.createPrinter({ newLine: ts.NewLineKind.LineFeed });
    Terminal.info('Paring down remaining types to their minimal subset');
    const cleanedUpNodes = nodesToKeep
        .map(node => getUtilizedSubsetOfNode(node, directlyUsedTypes))
        .filter((node, index, nodes): node is NonNullable<typeof node> => {
            if (!node) {
                return false;
            }
            /**
             * Delete the comments for deleted nodes
             */
            if (
                nodes[index + 1] === null &&
                (node.kind === ts.SyntaxKind.MultiLineCommentTrivia ||
                    node.kind === ts.SyntaxKind.SingleLineCommentTrivia)
            ) {
                return false;
            }
            return true;
        });
    Terminal.success('βœ“ Types pared down\n');
    Terminal.info('Writing graphql file to file system');
    fs.writeFileSync(
        path.resolve(TYPES_FILE),
        [...nodesToNotMutate, ...cleanedUpNodes]
            .map(node => printer.printNode(ts.EmitHint.Unspecified, node, sourceFile))
            .join('\n')
    );
    Terminal.success('βœ“ File saved');
};

Hopefully this helps someone! It was quite challenging to develop, but it certainly paid dividends. Got my generated types file from 30K lines down to 8K.